bexhoma.evaluators.dbmsbenchmarker module

Evaluator for DBMSBenchmarker experiments.

Provides DbmsBenchmarkerEvaluator, which extends LogEvaluator to parse and aggregate per-query performance results, warnings, errors, and latency statistics produced by the DBMSBenchmarker tool.

class bexhoma.evaluators.dbmsbenchmarker.DbmsBenchmarkerEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: LogEvaluator

Evaluator for a DBMSBenchmarker experiment.

Wraps a dbmsbenchmarker.inspector.inspector instance and exposes loading times, per-query latency statistics, throughput metrics, warning and error counts, and aggregation over parallel pods.

Parameters:

code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Unused; loading is always enabled for this evaluator.
include_benchmarking – Unused; benchmarking is always enabled.

benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod DBMSBenchmarker result rows into one row per group.

Groups by columns and applies geo-mean for timing/power metrics and max/sum for count metrics. Recomputes Throughput@Size from the aggregated values.

The phase column holds the phase identifier (configuration-experiment_run-client) and the job column holds the job identifier (configuration-experiment_run-client-benchmark_run).

The default columns=['phase'] groups by phase, producing one row per phase. To keep one row per job, pass columns=['job'].

Parameters:

df (pandas.DataFrame) – Benchmarking DataFrame (output of get_df_benchmarking()).
columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

benchmarking_set_datatypes(df)

Returns the DataFrame, adding a tenant_id column (value -1) when the column is absent so that DataFrames loaded from older pickles remain compatible with aggregation code that expects the column.

DBMSBenchmarker results are otherwise already typed by the inspector; no other conversion is needed.

Parameters:: df (pandas.DataFrame) – DataFrame of results.
Returns:: DataFrame with tenant_id guaranteed to be present.
Return type:: pandas.DataFrame

get_df_benchmarking()

Returns the DataFrame containing all benchmarking-phase results.

Combines per-query latency statistics, geo-mean execution times, and per-connection timing data from the DBMSBenchmarker inspector into a single DataFrame. Includes tenant_id read from the BEXHOMA_TENANT_ID loading parameter (-1 when absent).

Returns:: DataFrame with one row per connection/pod, or empty DataFrame on failure.
Return type:: pandas.DataFrame

get_df_loading()

Returns the DataFrame containing all loading-phase timing results.

Reads loading time fields (timeGenerate, timeIngesting, timeSchema, timeIndex, timeLoad) from the inspector’s connection data.

Returns:: DataFrame with one row per DBMS connection indexed as "DBMS".
Return type:: pandas.DataFrame

get_query_latencies(query_titles=False)

Returns the mean execution latency per query and DBMS.

Parameters:: query_titles (bool) – When True, replaces query index labels with human-readable titles from queries.config.
Returns:: DataFrame of mean latencies (ms) with queries as columns and DBMS as rows.
Return type:: pandas.DataFrame

get_summary_benchmark_per_connection()

Returns benchmarking results with one row per pod, filtered to the key display columns.

Applies benchmarking_set_datatypes() and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by (experiment_run, client, child).

Returns:: DataFrame indexed as "DBMS" with one row per pod, or None if there are no benchmarking results.
Return type:: pandas.DataFrame or None

get_summary_benchmark_per_phase()

Returns benchmarking results aggregated over parallel pods, one row per phase.

Applies benchmarking_set_datatypes(), aggregates via benchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by (experiment_run, target, pod_count).

Returns:: DataFrame indexed as "DBMS" with one row per phase, or an empty DataFrame if there are no benchmarking results.
Return type:: pandas.DataFrame

get_summary_benchmark_per_phase_multitenant()

Returns benchmarking results aggregated per phase and tenant, one row per (phase, tenant_id).

Like get_summary_benchmark_per_phase() but groups by ['phase', 'tenant_id'] so each tenant appears as a separate row.

Returns:: DataFrame indexed as "DBMS" with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.
Return type:: pandas.DataFrame

get_summary_loading_per_run()

Returns loading metrics aggregated per experiment run.

Delegates to get_loading_per_run() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:: DataFrame with one row per experiment run.
Return type:: pandas.DataFrame

get_total_errors(query_titles=False)

Returns the per-query error counts for this experiment.

Parameters:: query_titles (bool) – When True, replaces query index labels with human-readable titles from queries.config.
Returns:: DataFrame of error counts with queries as columns and DBMS as rows.
Return type:: pandas.DataFrame

get_total_warnings(query_titles=False)

Returns the per-query warning counts for this experiment.

Parameters:: query_titles (bool) – When True, replaces query index labels with human-readable titles from queries.config.
Returns:: DataFrame of warning counts with queries as columns and DBMS as rows.
Return type:: pandas.DataFrame

load_inspector()

Loads the DBMSBenchmarker inspector for this experiment.

Creates an inspector.inspector rooted at self.path_base, loads the experiment identified by self.code, and stores the result in self.evaluation. Sets self.evaluation to None if loading fails so callers can detect the uninitialized state.

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) → None

Record DBMSBenchmarker pass/fail tests.

Tests query metric columns (Geo Times, Power@Size, Throughput@Size), SQL error and warning counts supplied by _show_extra_sections, and workflow completeness.

Parameters:

experiment – The owning experiment object.
df_loading – Per-run loading DataFrame (unused here).
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
extra – Must contain num_errors and num_warnings from _show_extra_sections().

test_results()

Validates results by loading and reconstructing the workflow.

Returns:: 0 on success, 1 if an exception is raised.
Return type:: int

bexhoma.evaluators.dbmsbenchmarker.map_index_to_queryname(numQuery)

Maps a query index string (e.g., 'q1') to a human-readable title from the global query_properties dictionary.

If the title cannot be resolved, the original input string is returned unchanged.

Parameters:: numQuery (str) – A query index string, typically a letter followed by a number (e.g., 'q1').
Returns:: The query title from query_properties, or numQuery if not found.
Return type:: str