bexhoma.evaluators.benchbase module
Evaluator for Benchbase experiments.
Provides BenchbaseEvaluator, which extends LogEvaluator to parse and
aggregate throughput and latency results produced by the Benchbase
benchmarking tool.
Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.
- class bexhoma.evaluators.benchbase.BenchbaseEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
LogEvaluatorEvaluator for a Benchbase experiment.
Parses per-pod log files to extract throughput, goodput, and latency distribution results produced by the Benchbase benchmarking tool. Also provides time-series access to per-second throughput metrics via
get_benchmark_logs_timeseries_df_aggregated()andget_benchmark_logs_timeseries_df_single().- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.
- benchmark_logs_to_timeseries_df(list_logs, metric='throughput', aggregate=True)
Parses Benchbase log files for the given pod IDs and assembles a time-series DataFrame.
Each pod ID in
list_logsis resolved to matching log files via a glob pattern. WhenaggregateisTruethe per-second metric values from all pods are combined into a single DataFrame: percentile/max metrics use the element-wise maximum, minimum metrics use the element-wise minimum, and all others are summed. WhenaggregateisFalsea list of per-pod DataFrames is returned instead.- Parameters:
- Returns:
Aggregated DataFrame indexed by
'second'(with an'avg'column appended) whenaggregateisTrue, or a list of per-pod DataFrames.- Return type:
pandas.DataFrame or list[pandas.DataFrame]
- benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])
Aggregates parallel-pod result rows into one row per group.
Groups the typed benchmarking DataFrame by
columnsand applies per-metric aggregation functions (sum for throughput, max for latency percentiles, etc.).The
phasecolumn holds the phase identifier (configuration-experiment_run-client) and thejobcolumn holds the job identifier (configuration-experiment_run-client-benchmark_run).The default
columns=['phase']groups by phase, producing one row per phase (all jobs within a phase merged). To keep one row per job, passcolumns=['job'].- Parameters:
df (pandas.DataFrame) – Typed benchmarking DataFrame (output of
benchmarking_set_datatypes()).
- Returns:
Aggregated DataFrame with one row per group.
- Return type:
pandas.DataFrame
- benchmarking_set_datatypes(df)
Casts all benchmarking result columns to their appropriate data types.
Adds a
tenant_idcolumn (value-1) when the column is absent so that DataFrames loaded from older pickles remain compatible.- Parameters:
df (pandas.DataFrame) – DataFrame of raw benchmarking results.
- Returns:
DataFrame with columns cast to correct types.
- Return type:
pandas.DataFrame
- get_benchmark_logs_timeseries_df_aggregated(metric='throughput', configuration='', client='1', experiment_run='1')
Returns a DataFrame of time series of a metric for the benchmarking phase, aggregated over all pods per second.
Retrieves pod IDs from
get_df_benchmarking()filtered by the givenconfiguration,client, andexperiment_run, then delegates tobenchmark_logs_to_timeseries_df()withaggregate=True.- Parameters:
- Returns:
DataFrame indexed by
'second'with the metric and an'avg'column.- Return type:
pandas.DataFrame
- get_benchmark_logs_timeseries_df_single(metric='throughput', configuration='', client='1', experiment_run='1')
Returns a list of DataFrames of time series of a metric for the benchmarking phase, one per pod.
Retrieves pod IDs from
get_df_benchmarking()filtered by the givenconfiguration,client, andexperiment_run, then delegates tobenchmark_logs_to_timeseries_df()withaggregate=False.- Parameters:
- Returns:
List of DataFrames, one per pod, each indexed by
'second'.- Return type:
list[pandas.DataFrame]
- get_summary_benchmark_per_connection()
Returns benchmarking results with one row per pod, filtered to the key display columns.
Applies
benchmarking_set_datatypes()and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by(experiment_run, client, child).- Returns:
DataFrame indexed as
"DBMS"with one row per pod, orNoneif there are no benchmarking results.- Return type:
pandas.DataFrame or None
- get_summary_benchmark_per_phase()
Returns benchmarking results aggregated over parallel pods, one row per phase.
Applies
benchmarking_set_datatypes(), aggregates viabenchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (phase, experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by(experiment_run, target, pod_count).- Returns:
DataFrame indexed as
"DBMS"with one row per phase, or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_benchmark_per_phase_multitenant()
Returns benchmarking results aggregated per phase and tenant, one row per
(phase, tenant_id).Like
get_summary_benchmark_per_phase()but groups by['phase', 'tenant_id']so each tenant appears as a separate row.- Returns:
DataFrame indexed as
"DBMS"with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_loading_per_run()
Returns loading metrics aggregated per experiment run.
Delegates to
get_loading_per_run()(defined inbase), which reduces the per-connection loading DataFrame to one row per(code, configuration, experiment_run)and adds a'Throughput [SF/h]'column.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- log_to_df(filename)
Parses a Benchbase pod log file into a single-row DataFrame.
Extracts connection metadata (including
tenant_idfrom theBEXHOMA_TENANT_IDstdout line), benchmark parameters, and the JSON result block embedded between####BEXHOMA####markers. Returns an empty DataFrame when the log is incomplete (e.g. the start time has already passed).- Parameters:
filename (str) – Absolute path to the log file.
- Returns:
Single-row DataFrame of benchmarking results, or empty on failure.
- Return type:
pandas.DataFrame
- parse_benchbase_log_file(file_path)
Parses a Benchbase log file into a list of per-second throughput records.
Each
[INFO]log line that contains aThroughput:entry is converted into a dict with keyssecond(elapsed time) andthroughput.
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record Benchbase pass/fail tests: throughput and workflow completeness.
- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame (unused here).
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.