bexhoma.evaluators.benchbase module

Evaluator for Benchbase experiments.

Provides BenchbaseEvaluator, which extends LogEvaluator to parse and aggregate throughput and latency results produced by the Benchbase benchmarking tool.

class bexhoma.evaluators.benchbase.BenchbaseEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: LogEvaluator

Evaluator for a Benchbase experiment.

Parses per-pod log files to extract throughput, goodput, and latency distribution results produced by the Benchbase benchmarking tool. Also provides time-series access to per-second throughput metrics via get_benchmark_logs_timeseries_df_aggregated() and get_benchmark_logs_timeseries_df_single().

Parameters:

code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.

benchmark_logs_to_timeseries_df(list_logs, metric='throughput', aggregate=True)

Parses Benchbase log files for the given pod IDs and assembles a time-series DataFrame.

Each pod ID in list_logs is resolved to matching log files via a glob pattern. When aggregate is True the per-second metric values from all pods are combined into a single DataFrame: percentile/max metrics use the element-wise maximum, minimum metrics use the element-wise minimum, and all others are summed. When aggregate is False a list of per-pod DataFrames is returned instead.

Parameters:

list_logs (list[str]) – Pod IDs (short suffixes) used to locate matching log files.
metric (str) – Metric column to extract (default 'throughput').
aggregate (bool) – Whether to aggregate all pod DataFrames into one.

Returns:

Aggregated DataFrame indexed by 'second' (with an 'avg' column appended) when aggregate is True, or a list of per-pod DataFrames.

Return type:

pandas.DataFrame or list[pandas.DataFrame]

benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod result rows into one row per group.

Groups the typed benchmarking DataFrame by columns and applies per-metric aggregation functions (sum for throughput, max for latency percentiles, etc.).

The phase column holds the phase identifier (configuration-experiment_run-client) and the job column holds the job identifier (configuration-experiment_run-client-benchmark_run).

The default columns=['phase'] groups by phase, producing one row per phase (all jobs within a phase merged). To keep one row per job, pass columns=['job'].

Parameters:

df (pandas.DataFrame) – Typed benchmarking DataFrame (output of benchmarking_set_datatypes()).
columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

benchmarking_set_datatypes(df)

Casts all benchmarking result columns to their appropriate data types.

Adds a tenant_id column (value -1) when the column is absent so that DataFrames loaded from older pickles remain compatible.

Parameters:: df (pandas.DataFrame) – DataFrame of raw benchmarking results.
Returns:: DataFrame with columns cast to correct types.
Return type:: pandas.DataFrame

get_benchmark_logs_timeseries_df_aggregated(metric='throughput', configuration='', client='1', experiment_run='1')

Returns a DataFrame of time series of a metric for the benchmarking phase, aggregated over all pods per second.

Retrieves pod IDs from get_df_benchmarking() filtered by the given configuration, client, and experiment_run, then delegates to benchmark_logs_to_timeseries_df() with aggregate=True.

Parameters:

metric (str) – Metric column to extract (default 'throughput').
configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-65536').
client (str or int) – Client number (default '1').
experiment_run (str or int) – Experiment run number (default '1').

Returns:

DataFrame indexed by 'second' with the metric and an 'avg' column.

Return type:

pandas.DataFrame

get_benchmark_logs_timeseries_df_single(metric='throughput', configuration='', client='1', experiment_run='1')

Returns a list of DataFrames of time series of a metric for the benchmarking phase, one per pod.

Retrieves pod IDs from get_df_benchmarking() filtered by the given configuration, client, and experiment_run, then delegates to benchmark_logs_to_timeseries_df() with aggregate=False.

Parameters:

metric (str) – Metric column to extract (default 'throughput').
configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-65536').
client (str or int) – Client number (default '1').
experiment_run (str or int) – Experiment run number (default '1').

Returns:

List of DataFrames, one per pod, each indexed by 'second'.

Return type:

list[pandas.DataFrame]

get_summary_benchmark_per_connection()

Returns benchmarking results with one row per pod, filtered to the key display columns.

Applies benchmarking_set_datatypes() and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by (experiment_run, client, child).

Returns:: DataFrame indexed as "DBMS" with one row per pod, or None if there are no benchmarking results.
Return type:: pandas.DataFrame or None

get_summary_benchmark_per_phase()

Returns benchmarking results aggregated over parallel pods, one row per phase.

Applies benchmarking_set_datatypes(), aggregates via benchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (phase, experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by (experiment_run, target, pod_count).

Returns:: DataFrame indexed as "DBMS" with one row per phase, or an empty DataFrame if there are no benchmarking results.
Return type:: pandas.DataFrame

get_summary_benchmark_per_phase_multitenant()

Returns benchmarking results aggregated per phase and tenant, one row per (phase, tenant_id).

Like get_summary_benchmark_per_phase() but groups by ['phase', 'tenant_id'] so each tenant appears as a separate row.

Returns:: DataFrame indexed as "DBMS" with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.
Return type:: pandas.DataFrame

get_summary_loading_per_run()

Returns loading metrics aggregated per experiment run.

Delegates to get_loading_per_run() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:: DataFrame with one row per experiment run.
Return type:: pandas.DataFrame

log_to_df(filename)

Parses a Benchbase pod log file into a single-row DataFrame.

Extracts connection metadata (including tenant_id from the BEXHOMA_TENANT_ID stdout line), benchmark parameters, and the JSON result block embedded between ####BEXHOMA#### markers. Returns an empty DataFrame when the log is incomplete (e.g. the start time has already passed).

Parameters:: filename (str) – Absolute path to the log file.
Returns:: Single-row DataFrame of benchmarking results, or empty on failure.
Return type:: pandas.DataFrame

parse_benchbase_log_file(file_path)

Parses a Benchbase log file into a list of per-second throughput records.

Each [INFO] log line that contains a Throughput: entry is converted into a dict with keys second (elapsed time) and throughput.

Parameters:: file_path (str) – Absolute path to the Benchbase log file.
Returns:: List of {'second': int, 'throughput': float} dicts.
Return type:: list[dict]

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) → None

Record Benchbase pass/fail tests: throughput and workflow completeness.

When the experiment ran in start-only or load-only mode (experiment.benchmarking_is_active() is False), the throughput test is skipped rather than failed, because no benchmarking phase ran.

Parameters:

experiment – The owning experiment object.
df_loading – Per-run loading DataFrame (unused here).
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.