bexhoma.collectors.ycsb module
Collector for YCSB experiments.
Provides YcsbCollector, a thin subclass of CollectorBase that wires up
evaluators.ycsb as the evaluator. All data collection and
aggregation logic is inherited from CollectorBase.
Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.
- class bexhoma.collectors.ycsb.YcsbCollector(path, codes, benchmark_run: int = 0)
Bases:
CollectorBaseCollector for YCSB experiments.
Overrides
get_evaluator()to return aevaluators.ycsbinstance. All data collection and aggregation methods are inherited fromCollectorBase.- get_benchmark_timeseries_all(metric='current_ops_per_sec')
Collects long-format YCSB time-series data for a given metric across all experiment codes.
For each code and each unique
(configuration, client, experiment_run)combination, callsevaluators.ycsb.get_benchmark_logs_timeseries_df_aggregated(), reshapes the result to long format, and annotates each row with its identifying fields. Connection metadata (e.g.type_tenants,num_tenants,vol_tenants) is joined in fromget_connections()for each code.The YCSB evaluator uses
'sec'as the index name; this method normalises it to'second'for consistency with other time-series methods.- Parameters:
metric (str) – YCSB metric to retrieve (default
'current_ops_per_sec').- Returns:
Long-format DataFrame with columns
second,code,configuration,client,experiment_run,metric,value, plus connection metadata columns.- Return type:
pandas.DataFrame
- get_benchmark_timeseries_per_phase(metric='current_ops_per_sec')
Combines aggregated YCSB time-series per phase from all experiment codes into a wide-format DataFrame.
For each code and each unique
(configuration, client, experiment_run, benchmark_run)combination, callsevaluators.ycsb.get_benchmark_logs_timeseries_df_aggregated()and places the metric column as one column in the result. Each column is labelled{code}-{configuration}-{client}-{experiment_run}-{benchmark_run}.- Parameters:
metric (str) – YCSB metric to retrieve (default
'current_ops_per_sec').- Returns:
Wide-format DataFrame indexed by second with one column per phase, or an empty DataFrame when no data is available.
- Return type:
pandas.DataFrame
- get_evaluator(code='')
Returns a
evaluators.ycsbinstance for the given experiment code.- Parameters:
code (str) – Experiment identifier. Defaults to the first code in
self.codes.- Returns:
YCSB evaluator for the specified experiment.
- Return type:
evaluators.ycsb
- get_loading_timeseries_all(metric='current_ops_per_sec')
Collects long-format YCSB loading time-series data for a given metric across all experiment codes.
For each code and each unique
(configuration, experiment_run)combination, callsevaluators.ycsb.get_loading_logs_timeseries_df_aggregated(), reshapes the result to long format, and annotates each row with its identifying fields. Connection metadata (e.g.type_tenants,num_tenants,vol_tenants) is joined in fromget_connections()for each code.Unlike the benchmarking variant, the loading phase has no
clientdimension.The YCSB evaluator uses
'sec'as the index name; this method normalises it to'second'for consistency with other time-series methods.- Parameters:
metric (str) – YCSB metric to retrieve (default
'current_ops_per_sec').- Returns:
Long-format DataFrame with columns
second,code,configuration,experiment_run,metric,value, plus connection metadata columns.- Return type:
pandas.DataFrame
- get_loading_timeseries_per_phase(metric='current_ops_per_sec')
Combines aggregated YCSB loading time-series per phase from all experiment codes into a wide-format DataFrame.
For each code and each unique
(configuration, experiment_run)combination, callsevaluators.ycsb.get_loading_logs_timeseries_df_aggregated()and places the metric column as one column in the result. Each column is labelled{code}-{configuration}-{experiment_run}.Unlike the benchmarking variant, the loading phase has no
clientdimension.- Parameters:
metric (str) – YCSB metric to retrieve (default
'current_ops_per_sec').- Returns:
Wide-format DataFrame indexed by second with one column per phase, or an empty DataFrame when no data is available.
- Return type:
pandas.DataFrame