bexhoma.collectors.ycsb module

Collector for YCSB experiments.

Provides YcsbCollector, a thin subclass of CollectorBase that wires up evaluators.ycsb as the evaluator. All data collection and aggregation logic is inherited from CollectorBase.

Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.

class bexhoma.collectors.ycsb.YcsbCollector(path, codes, benchmark_run: int = 0)

Bases: CollectorBase

Collector for YCSB experiments.

Overrides get_evaluator() to return a evaluators.ycsb instance. All data collection and aggregation methods are inherited from CollectorBase.

get_benchmark_timeseries_all(metric='current_ops_per_sec')

Collects long-format YCSB time-series data for a given metric across all experiment codes.

For each code and each unique (configuration, client, experiment_run) combination, calls evaluators.ycsb.get_benchmark_logs_timeseries_df_aggregated(), reshapes the result to long format, and annotates each row with its identifying fields. Connection metadata (e.g. type_tenants, num_tenants, vol_tenants) is joined in from get_connections() for each code.

The YCSB evaluator uses 'sec' as the index name; this method normalises it to 'second' for consistency with other time-series methods.

Parameters:

metric (str) – YCSB metric to retrieve (default 'current_ops_per_sec').

Returns:

Long-format DataFrame with columns second, code, configuration, client, experiment_run, metric, value, plus connection metadata columns.

Return type:

pandas.DataFrame

get_benchmark_timeseries_per_phase(metric='current_ops_per_sec')

Combines aggregated YCSB time-series per phase from all experiment codes into a wide-format DataFrame.

For each code and each unique (configuration, client, experiment_run, benchmark_run) combination, calls evaluators.ycsb.get_benchmark_logs_timeseries_df_aggregated() and places the metric column as one column in the result. Each column is labelled {code}-{configuration}-{client}-{experiment_run}-{benchmark_run}.

Parameters:

metric (str) – YCSB metric to retrieve (default 'current_ops_per_sec').

Returns:

Wide-format DataFrame indexed by second with one column per phase, or an empty DataFrame when no data is available.

Return type:

pandas.DataFrame

get_evaluator(code='')

Returns a evaluators.ycsb instance for the given experiment code.

Parameters:

code (str) – Experiment identifier. Defaults to the first code in self.codes.

Returns:

YCSB evaluator for the specified experiment.

Return type:

evaluators.ycsb

get_loading_timeseries_all(metric='current_ops_per_sec')

Collects long-format YCSB loading time-series data for a given metric across all experiment codes.

For each code and each unique (configuration, experiment_run) combination, calls evaluators.ycsb.get_loading_logs_timeseries_df_aggregated(), reshapes the result to long format, and annotates each row with its identifying fields. Connection metadata (e.g. type_tenants, num_tenants, vol_tenants) is joined in from get_connections() for each code.

Unlike the benchmarking variant, the loading phase has no client dimension.

The YCSB evaluator uses 'sec' as the index name; this method normalises it to 'second' for consistency with other time-series methods.

Parameters:

metric (str) – YCSB metric to retrieve (default 'current_ops_per_sec').

Returns:

Long-format DataFrame with columns second, code, configuration, experiment_run, metric, value, plus connection metadata columns.

Return type:

pandas.DataFrame

get_loading_timeseries_per_phase(metric='current_ops_per_sec')

Combines aggregated YCSB loading time-series per phase from all experiment codes into a wide-format DataFrame.

For each code and each unique (configuration, experiment_run) combination, calls evaluators.ycsb.get_loading_logs_timeseries_df_aggregated() and places the metric column as one column in the result. Each column is labelled {code}-{configuration}-{experiment_run}.

Unlike the benchmarking variant, the loading phase has no client dimension.

Parameters:

metric (str) – YCSB metric to retrieve (default 'current_ops_per_sec').

Returns:

Wide-format DataFrame indexed by second with one column per phase, or an empty DataFrame when no data is available.

Return type:

pandas.DataFrame