bexhoma.evaluators package

Submodules

Module contents

Public API of the bexhoma.evaluators package.

Exports EvaluatorBase, LogEvaluator, DbmsBenchmarkerEvaluator, BenchbaseEvaluator, TpccEvaluator, and YcsbEvaluator evaluator classes, plus the natural_sort() utility from base.

Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.

class bexhoma.evaluators.BenchbaseEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: LogEvaluator

Evaluator for a Benchbase experiment.

Parses per-pod log files to extract throughput, goodput, and latency distribution results produced by the Benchbase benchmarking tool. Also provides time-series access to per-second throughput metrics via get_benchmark_logs_timeseries_df_aggregated() and get_benchmark_logs_timeseries_df_single().

Parameters:
  • code – Experiment identifier — also the name of the result sub-folder.

  • path – Root path that contains the result folders.

  • include_loading – Whether loading-phase results are expected.

  • include_benchmarking – Whether benchmarking-phase results are expected.

benchmark_logs_to_timeseries_df(list_logs, metric='throughput', aggregate=True)

Parses Benchbase log files for the given pod IDs and assembles a time-series DataFrame.

Each pod ID in list_logs is resolved to matching log files via a glob pattern. When aggregate is True the per-second metric values from all pods are combined into a single DataFrame: percentile/max metrics use the element-wise maximum, minimum metrics use the element-wise minimum, and all others are summed. When aggregate is False a list of per-pod DataFrames is returned instead.

Parameters:
  • list_logs (list[str]) – Pod IDs (short suffixes) used to locate matching log files.

  • metric (str) – Metric column to extract (default 'throughput').

  • aggregate (bool) – Whether to aggregate all pod DataFrames into one.

Returns:

Aggregated DataFrame indexed by 'second' (with an 'avg' column appended) when aggregate is True, or a list of per-pod DataFrames.

Return type:

pandas.DataFrame or list[pandas.DataFrame]

benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod result rows into one row per group.

Groups the typed benchmarking DataFrame by columns and applies per-metric aggregation functions (sum for throughput, max for latency percentiles, etc.).

The phase column holds the phase identifier (configuration-experiment_run-client) and the job column holds the job identifier (configuration-experiment_run-client-benchmark_run).

The default columns=['phase'] groups by phase, producing one row per phase (all jobs within a phase merged). To keep one row per job, pass columns=['job'].

Parameters:
  • df (pandas.DataFrame) – Typed benchmarking DataFrame (output of benchmarking_set_datatypes()).

  • columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

benchmarking_set_datatypes(df)

Casts all benchmarking result columns to their appropriate data types.

Adds a tenant_id column (value -1) when the column is absent so that DataFrames loaded from older pickles remain compatible.

Parameters:

df (pandas.DataFrame) – DataFrame of raw benchmarking results.

Returns:

DataFrame with columns cast to correct types.

Return type:

pandas.DataFrame

get_benchmark_logs_timeseries_df_aggregated(metric='throughput', configuration='', client='1', experiment_run='1')

Returns a DataFrame of time series of a metric for the benchmarking phase, aggregated over all pods per second.

Retrieves pod IDs from get_df_benchmarking() filtered by the given configuration, client, and experiment_run, then delegates to benchmark_logs_to_timeseries_df() with aggregate=True.

Parameters:
  • metric (str) – Metric column to extract (default 'throughput').

  • configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-65536').

  • client (str or int) – Client number (default '1').

  • experiment_run (str or int) – Experiment run number (default '1').

Returns:

DataFrame indexed by 'second' with the metric and an 'avg' column.

Return type:

pandas.DataFrame

get_benchmark_logs_timeseries_df_single(metric='throughput', configuration='', client='1', experiment_run='1')

Returns a list of DataFrames of time series of a metric for the benchmarking phase, one per pod.

Retrieves pod IDs from get_df_benchmarking() filtered by the given configuration, client, and experiment_run, then delegates to benchmark_logs_to_timeseries_df() with aggregate=False.

Parameters:
  • metric (str) – Metric column to extract (default 'throughput').

  • configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-65536').

  • client (str or int) – Client number (default '1').

  • experiment_run (str or int) – Experiment run number (default '1').

Returns:

List of DataFrames, one per pod, each indexed by 'second'.

Return type:

list[pandas.DataFrame]

get_summary_benchmark_per_connection()

Returns benchmarking results with one row per pod, filtered to the key display columns.

Applies benchmarking_set_datatypes() and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by (experiment_run, client, child).

Returns:

DataFrame indexed as "DBMS" with one row per pod, or None if there are no benchmarking results.

Return type:

pandas.DataFrame or None

get_summary_benchmark_per_phase()

Returns benchmarking results aggregated over parallel pods, one row per phase.

Applies benchmarking_set_datatypes(), aggregates via benchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (phase, experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by (experiment_run, target, pod_count).

Returns:

DataFrame indexed as "DBMS" with one row per phase, or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_benchmark_per_phase_multitenant()

Returns benchmarking results aggregated per phase and tenant, one row per (phase, tenant_id).

Like get_summary_benchmark_per_phase() but groups by ['phase', 'tenant_id'] so each tenant appears as a separate row.

Returns:

DataFrame indexed as "DBMS" with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_loading_per_run()

Returns loading metrics aggregated per experiment run.

Delegates to get_loading_per_run() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

log_to_df(filename)

Parses a Benchbase pod log file into a single-row DataFrame.

Extracts connection metadata (including tenant_id from the BEXHOMA_TENANT_ID stdout line), benchmark parameters, and the JSON result block embedded between ####BEXHOMA#### markers. Returns an empty DataFrame when the log is incomplete (e.g. the start time has already passed).

Parameters:

filename (str) – Absolute path to the log file.

Returns:

Single-row DataFrame of benchmarking results, or empty on failure.

Return type:

pandas.DataFrame

parse_benchbase_log_file(file_path)

Parses a Benchbase log file into a list of per-second throughput records.

Each [INFO] log line that contains a Throughput: entry is converted into a dict with keys second (elapsed time) and throughput.

Parameters:

file_path (str) – Absolute path to the Benchbase log file.

Returns:

List of {'second': int, 'throughput': float} dicts.

Return type:

list[dict]

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None

Record Benchbase pass/fail tests: throughput and workflow completeness.

Parameters:
  • experiment – The owning experiment object.

  • df_loading – Per-run loading DataFrame (unused here).

  • df_reduced – Per-phase execution DataFrame.

  • workflow_actual – Reconstructed actual workflow dict.

  • workflow_planned – Planned workflow dict from workload config.

class bexhoma.evaluators.DbmsBenchmarkerEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: LogEvaluator

Evaluator for a DBMSBenchmarker experiment.

Wraps a dbmsbenchmarker.inspector.inspector instance and exposes loading times, per-query latency statistics, throughput metrics, warning and error counts, and aggregation over parallel pods.

Parameters:
  • code – Experiment identifier — also the name of the result sub-folder.

  • path – Root path that contains the result folders.

  • include_loading – Unused; loading is always enabled for this evaluator.

  • include_benchmarking – Unused; benchmarking is always enabled.

benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod DBMSBenchmarker result rows into one row per group.

Groups by columns and applies geo-mean for timing/power metrics and max/sum for count metrics. Recomputes Throughput@Size from the aggregated values.

The phase column holds the phase identifier (configuration-experiment_run-client) and the job column holds the job identifier (configuration-experiment_run-client-benchmark_run).

The default columns=['phase'] groups by phase, producing one row per phase. To keep one row per job, pass columns=['job'].

Parameters:
  • df (pandas.DataFrame) – Benchmarking DataFrame (output of get_df_benchmarking()).

  • columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

benchmarking_set_datatypes(df)

Returns the DataFrame, adding a tenant_id column (value -1) when the column is absent so that DataFrames loaded from older pickles remain compatible with aggregation code that expects the column.

DBMSBenchmarker results are otherwise already typed by the inspector; no other conversion is needed.

Parameters:

df (pandas.DataFrame) – DataFrame of results.

Returns:

DataFrame with tenant_id guaranteed to be present.

Return type:

pandas.DataFrame

get_df_benchmarking()

Returns the DataFrame containing all benchmarking-phase results.

Combines per-query latency statistics, geo-mean execution times, and per-connection timing data from the DBMSBenchmarker inspector into a single DataFrame. Includes tenant_id read from the BEXHOMA_TENANT_ID loading parameter (-1 when absent).

Returns:

DataFrame with one row per connection/pod, or empty DataFrame on failure.

Return type:

pandas.DataFrame

get_df_loading()

Returns the DataFrame containing all loading-phase timing results.

Reads loading time fields (timeGenerate, timeIngesting, timeSchema, timeIndex, timeLoad) from the inspector’s connection data.

Returns:

DataFrame with one row per DBMS connection indexed as "DBMS".

Return type:

pandas.DataFrame

get_query_latencies(query_titles=False)

Returns the mean execution latency per query and DBMS.

Parameters:

query_titles (bool) – When True, replaces query index labels with human-readable titles from queries.config.

Returns:

DataFrame of mean latencies (ms) with queries as columns and DBMS as rows.

Return type:

pandas.DataFrame

get_summary_benchmark_per_connection()

Returns benchmarking results with one row per pod, filtered to the key display columns.

Applies benchmarking_set_datatypes() and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by (experiment_run, client, child).

Returns:

DataFrame indexed as "DBMS" with one row per pod, or None if there are no benchmarking results.

Return type:

pandas.DataFrame or None

get_summary_benchmark_per_phase()

Returns benchmarking results aggregated over parallel pods, one row per phase.

Applies benchmarking_set_datatypes(), aggregates via benchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by (experiment_run, target, pod_count).

Returns:

DataFrame indexed as "DBMS" with one row per phase, or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_benchmark_per_phase_multitenant()

Returns benchmarking results aggregated per phase and tenant, one row per (phase, tenant_id).

Like get_summary_benchmark_per_phase() but groups by ['phase', 'tenant_id'] so each tenant appears as a separate row.

Returns:

DataFrame indexed as "DBMS" with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_loading_per_run()

Returns loading metrics aggregated per experiment run.

Delegates to get_loading_per_run() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

get_total_errors(query_titles=False)

Returns the per-query error counts for this experiment.

Parameters:

query_titles (bool) – When True, replaces query index labels with human-readable titles from queries.config.

Returns:

DataFrame of error counts with queries as columns and DBMS as rows.

Return type:

pandas.DataFrame

get_total_warnings(query_titles=False)

Returns the per-query warning counts for this experiment.

Parameters:

query_titles (bool) – When True, replaces query index labels with human-readable titles from queries.config.

Returns:

DataFrame of warning counts with queries as columns and DBMS as rows.

Return type:

pandas.DataFrame

load_inspector()

Loads the DBMSBenchmarker inspector for this experiment.

Creates an inspector.inspector rooted at self.path_base, loads the experiment identified by self.code, and stores the result in self.evaluation. Sets self.evaluation to None if loading fails so callers can detect the uninitialized state.

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None

Record DBMSBenchmarker pass/fail tests.

Tests query metric columns (Geo Times, Power@Size, Throughput@Size), SQL error and warning counts supplied by _show_extra_sections, and workflow completeness.

Parameters:
  • experiment – The owning experiment object.

  • df_loading – Per-run loading DataFrame (unused here).

  • df_reduced – Per-phase execution DataFrame.

  • workflow_actual – Reconstructed actual workflow dict.

  • workflow_planned – Planned workflow dict from workload config.

  • extra – Must contain num_errors and num_warnings from _show_extra_sections().

test_results()

Validates results by loading and reconstructing the workflow.

Returns:

0 on success, 1 if an exception is raised.

Return type:

int

class bexhoma.evaluators.EvaluatorBase(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: object

Base evaluator for a single bexhoma experiment.

Loads the experiment result folder identified by code inside path, provides helpers for scanning log files and reconstructing the workflow, and exposes connection/loading metadata. All benchmark-specific evaluators inherit from this class (via logger).

Parameters:
  • code – Experiment identifier — also the name of the result sub-folder.

  • path – Root path that contains the result folders.

  • include_loading – Whether loading-phase results are expected.

  • include_benchmarking – Whether benchmarking-phase results are expected.

  • benchmark_run – 1-based position in the benchmark sequence; 0 means unset.

add_connection_to_result(c, connection_id, result)

Appends a flattened connection entry to result keyed by connection_id.

Extracts scalar fields from the connection config dict c — including host-system, loading-parameter, benchmarking-parameter, SUT-parameter, and args sections — and stores them with prefixed keys (host_*, loading_parameters_*, benchmarking_parameters_*, sut_parameters_*, arg_*). Non-scalar values (lists and dicts) are skipped.

Parameters:
  • c (dict) – Single connection entry from connections.config.

  • connection_id (str) – Key to use when inserting into result.

  • result (dict) – Accumulator dict that maps connection IDs to metadata rows.

end_benchmarking(jobname)

Processes all benchmarker log files for a given job name.

Scans the result folder for files matching bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.log and calls log_to_df() on each one.

Parameters:

jobname (str) – Job name used to filter matching log files.

end_loading(jobname)

Processes all loader sensor log files for a given job name.

Scans the result folder for files matching bexhoma-loading-<jobname>*.sensor.log and calls log_to_df() on each one.

Parameters:

jobname (str) – Job name used to filter matching log files.

evaluate_results(pod_dashboard='')

Scans all log files and grabs some information. In this class basically it scans for errors.

get_connections_of_experiment()

Returns connection metadata for a single experiment.

Reads connections.config and builds a row per pod/client with the following key columns: phase (code-prefixed phase identifier, <code>-<configuration>-<experiment_run>-<client>), job (code-prefixed job identifier, <code>-<configuration>-<experiment_run>-<client>-<benchmark_run>), code, connection, configuration, experiment_run, client, type_tenants, num_tenants, vol_tenants, plus flattened host-system, loading-parameter, benchmarking-parameter, and SUT-parameter fields.

When a connection entry carries orig_name, the entry represents an individual pod; otherwise a synthetic row is generated for each parallel client.

Returns:

DataFrame of connection metadata, one row per pod/client.

Return type:

pandas.DataFrame

get_df_benchmarking()

Returns the DataFrame containing all benchmarking-phase results.

Returns:

Empty DataFrame; overridden by subclasses.

Return type:

pandas.DataFrame

get_df_loading()

Returns the DataFrame containing all loading-phase results.

Returns:

Empty DataFrame; overridden by subclasses.

Return type:

pandas.DataFrame

get_loading_per_connection()

Returns loading metrics for each individual connection (pod/client), enriched with the scale factor and a 'Throughput [SF/h]' derived column.

Returns:

DataFrame with one row per connection.

Return type:

pandas.DataFrame

get_loading_per_run()

Returns loading metrics aggregated per (code, configuration, experiment_run).

Takes the per-connection DataFrame from get_loading_per_connection() and reduces it to one row per experiment run by taking the max across connections, then recomputes 'Throughput [SF/h]' from the aggregated load time.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

get_loading_per_run_multitenant()

Returns loading metrics aggregated per (code, experiment_run, type_tenants, vol_tenants, num_tenants, tenant_id) for multi-tenant experiments.

For container tenancy the tenant_id key distinguishes individual tenant loading times.

For schema/database tenancy, reads per-pod sensor log files via _get_tenant_loading_from_logs() to expand the single shared connection row into one row per tenant. Each row carries the tenant’s own loading duration (time_ingest / time_load) and a matching tenant_id. If the sensor logs are absent the result collapses to one row with tenant_id = '' (same as before).

Returns:

DataFrame with one row per tenant per experiment run.

Return type:

pandas.DataFrame

get_summary_loading_per_run_multitenant()

Returns loading metrics per tenant per experiment run, with housekeeping columns removed.

Wraps get_loading_per_run_multitenant() and drops code and configuration so the result is ready to display in show_summary().

Returns:

DataFrame with one row per (tenant_id, experiment_run) combination.

Return type:

pandas.DataFrame

get_workload()

Returns the workload configuration of an experiment.

Reads the queries.config file from the experiment result folder and returns its contents as a Python dictionary.

Returns:

Workload properties dictionary.

Return type:

dict

log_to_df(filename)

Scans a pod log file for known errors and records them in self.workflow_errors.

Returns an empty DataFrame; subclasses override this method to also parse benchmark results out of the log.

Parameters:

filename (str) – Absolute path to the log file.

Returns:

Empty DataFrame (subclasses return populated DataFrames).

Return type:

pandas.DataFrame

log_to_df_loading(filename: str) DataFrame

Parse a loading pod log file and return the result as a DataFrame.

Default implementation delegates to log_to_df(), which is correct for benchmarks whose loading and benchmarking log formats are identical (e.g. YCSB). Subclasses where the formats differ must override this method.

Parameters:

filename (str) – Absolute path to the loading log file.

Returns:

DataFrame of loading results, or empty DataFrame on failure.

Return type:

pandas.DataFrame

reconstruct_workflow(df: DataFrame) dict

Reconstructs the actual experiment workflow from connection metadata.

Reads benchmark_sequence from queries.config to map each benchmark_run index to its benchmarker type, then groups the DataFrame by (configuration, experiment_run, client, benchmark_run) to produce a structure that mirrors the planned workflow format:

{
    'MySQL-24-4-1024': [
        [  # experiment run 1
            [  # client round 1
                {'type': 'dbmsbenchmarker', 'pods': 4},
                {'type': 'tpch_refresh',    'pods': 1},
            ],
            [  # client round 2
                {'type': 'dbmsbenchmarker', 'pods': 8},
                {'type': 'tpch_refresh',    'pods': 1},
            ],
        ],
    ],
}
Parameters:

df (pandas.DataFrame) – Connection metadata DataFrame returned by get_connections_of_experiment(), with at least the columns configuration, experiment_run, client, benchmark_run, and pods.

Returns:

Workflow dict mapping configuration name to the nested structure.

Return type:

dict

test_results()

Validates results locally and returns an exit code.

Returns:

0 on success; subclasses return 1 on failure.

Return type:

int

test_results_column(df, test_column: str) bool

Check whether a column in a DataFrame contains any zero or NaN values.

Parameters:
  • df (pandas.DataFrame) – DataFrame to check.

  • test_column (str) – Column name to inspect.

Returns:

True if the column is fully populated with non-zero values, False otherwise.

Return type:

bool

transform_all_logs_benchmarking()

Iterates over all benchmarker log files and calls end_benchmarking() for each.

When self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index (last --separated component), so that each evaluator in a multi-benchmark experiment only ingests its own logs.

transform_all_logs_loading()

Iterates over all loader sensor log files and calls end_loading() for each.

When self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index, so that each evaluator only ingests its own loading logs.

class bexhoma.evaluators.LogEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: EvaluatorBase

Evaluator base that reads benchmark log files into DataFrames.

Extends base by implementing end_benchmarking() and end_loading() to parse pod log files, pickle the resulting DataFrames, and collect them into a single combined pickle per phase. All benchmark-specific evaluators (benchbase, ycsb, tpcc, dbmsbenchmarker) inherit from this class.

end_benchmarking(jobname)

Parses all benchmarker log files for a job and caches results as pickle files.

Scans the result folder for files matching bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.log, calls log_to_df() on each, and writes non-empty results to a <filename>.df.pickle side-car file.

Parameters:

jobname (str) – Job name used to filter matching log files.

end_loading(jobname)

Parses all loader sensor log files for a job and caches results as pickle files.

Scans the result folder for files matching bexhoma-loading-<jobname>*.sensor.log, calls log_to_df() on each, prints a message when errors are detected, and writes non-empty results to a <filename>.df.pickle side-car file.

Parameters:

jobname (str) – Job name used to filter matching log files.

evaluate_results(pod_dashboard='')

Parses all pod log files and persists the results as pickled DataFrames.

Calls transform_all_logs_benchmarking() and _collect_dfs() for the benchmarking phase when include_benchmarking is set, and analogously for the loading phase. When benchmark_run > 0, each phase writes a per-benchmark pickle file rather than the shared *.all.df.pickle.

get_connection_config()

Returns the parsed connections.config as a list of connection dicts, sorted by connection name.

Returns:

List of connection configuration dicts.

Return type:

list[dict]

get_df_benchmarking()

Returns the DataFrame containing all benchmarking-phase results.

Reads from the combined pickle file, triggering evaluate_results() to generate it on first access if it does not yet exist.

Ensures a pods column is present: when the pickle was written before this column was added (older runs), it is derived from pod_count.

Returns:

DataFrame of benchmarking results, or empty DataFrame when unavailable.

Return type:

pandas.DataFrame

get_df_loading()

Returns the DataFrame containing all loading-phase results.

Reads from the combined pickle file if it exists.

Returns:

DataFrame of loading results, or empty DataFrame when unavailable.

Return type:

pandas.DataFrame

get_monitoring_metric(metric, component='loading')

Returns a wide-format DataFrame of a single monitoring metric for a component.

Reads the pre-combined CSV produced by transform_monitoring_results() and returns it transposed so that rows are timestamps and columns are connections.

Parameters:
  • metric (str) – Metric key (e.g. 'cpu_throttled_seconds_total').

  • component (str) – Component label used in the metric filename prefix.

Returns:

Wide-format DataFrame, or empty DataFrame if the file does not exist.

Return type:

pandas.DataFrame

get_monitoring_metrics()

Returns the list of metric keys defined in the first connection’s monitoring block.

Returns:

List of metric key strings, or empty list when no metrics are configured.

Return type:

list[str]

plot(df, column, x, y, plot_by=None, kind='line', dict_colors=None, figsize=(12, 8))

Plots one or more line (or other) charts from a DataFrame.

When plot_by is None, a single chart is produced with one line per value in column. When plot_by is given, a grid of sub-plots is created — one per group defined by plot_by — with lines split by column within each sub-plot.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing the data to plot.

  • column (str) – Column whose unique values define individual lines.

  • x (str) – Column to use as the x-axis.

  • y (str) – Column to use as the y-axis.

  • plot_by (str or None) – Optional column whose values define separate sub-plots.

  • kind (str) – Plot kind passed to DataFrame.plot (e.g. 'line', 'bar').

  • dict_colors (dict or None) – Optional colour mapping for the kind keyword.

  • figsize (tuple) – Figure size as (width, height) in inches.

Returns:

Matplotlib axes object (single axes when plot_by is None).

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None

Record pass/fail test results for this benchmark.

Default: tests only that the workflow matches the plan. Override in benchmark-specific evaluator subclasses to test metric columns.

Parameters:
  • experiment – The owning experiment object.

  • df_loading – Per-run loading DataFrame.

  • df_reduced – Per-phase execution DataFrame.

  • workflow_actual – Reconstructed actual workflow dict.

  • workflow_planned – Planned workflow dict from workload config.

test_results()

Validates results by loading and reconstructing the workflow.

Returns:

0 on success, 1 if an exception is raised.

Return type:

int

transform_monitoring_results(component='loading')

Combines per-connection monitoring CSV files into a single wide-format CSV.

For example, per-connection files like:

query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-1.csv
query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-2.csv

are merged into:

query_datagenerator_metric_total_cpu_util.csv
Parameters:

component (str) – Component label used in the metric filename prefix (e.g. 'loading', 'stream').

class bexhoma.evaluators.TpccEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: LogEvaluator

Evaluator for a HammerDB TPC-C experiment.

Parses per-pod log files to extract NOPM, TPM, and optional latency statistics (CALLS, MIN, AVG, MAX, TOTAL, P99, P95, P50, SD, RATIO) and assembles them into DataFrames. Aggregation over parallel pods follows the same pattern as the other logger-based evaluators.

Parameters:
  • code – Experiment identifier — also the name of the result sub-folder.

  • path – Root path that contains the result folders.

  • include_loading – Whether loading-phase results are expected.

  • include_benchmarking – Whether benchmarking-phase results are expected.

benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod TPC-C result rows into one row per group.

Groups by columns and applies per-metric aggregation (NOPM/TPM averaged across pods, max for latency percentiles, etc.). Also recomputes efficiency for runs where vusers equal 10× the scale factor.

The phase column holds the phase identifier (configuration-experiment_run-client) and the job column holds the job identifier (configuration-experiment_run-client-benchmark_run).

The default columns=['phase'] groups by phase, producing one row per phase. To keep one row per job, pass columns=['job'].

Parameters:
  • df (pandas.DataFrame) – Typed TPC-C benchmarking DataFrame.

  • columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

benchmarking_set_datatypes(df)

Casts all TPC-C benchmarking result columns to their appropriate data types.

Handles two variants: with latency statistics (CALLS present) and without.

Parameters:

df (pandas.DataFrame) – DataFrame of raw TPC-C benchmarking results.

Returns:

DataFrame with columns cast to correct types.

Return type:

pandas.DataFrame

get_summary_benchmark_per_connection()

Returns benchmarking results with one row per pod, filtered to the key display columns.

Applies benchmarking_set_datatypes() and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by (experiment_run, client, child).

Returns:

DataFrame indexed as "DBMS" with one row per pod, or None if there are no benchmarking results.

Return type:

pandas.DataFrame or None

get_summary_benchmark_per_phase()

Returns benchmarking results aggregated over parallel pods, one row per phase.

Applies benchmarking_set_datatypes(), aggregates via benchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by (experiment_run, target, pod_count).

Returns:

DataFrame indexed as "DBMS" with one row per phase, or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_benchmark_per_phase_multitenant()

Returns TPC-C benchmarking results aggregated per phase and tenant, one row per (phase, tenant_id).

Like get_summary_benchmark_per_phase() but groups by ['phase', 'tenant_id'] so each tenant appears as a separate row.

Returns:

DataFrame indexed as "DBMS" with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_loading_per_run()

Returns loading metrics aggregated per experiment run.

Delegates to get_loading_per_run() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

log_to_df(filename)

Parses a HammerDB TPC-C pod log file into a DataFrame.

Extracts NOPM, TPM, vuser counts, and — when HammerDB time-profile output is present — latency statistics (CALLS, MIN, AVG, MAX, TOTAL, P99, P95, P50, SD, RATIO) for the NEWORD procedure.

Parameters:

filename (str) – Absolute path to the HammerDB log file.

Returns:

DataFrame with one row per TPC-C result iteration, or empty on parse failure.

Return type:

pandas.DataFrame

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None

Record TPC-C pass/fail tests: NOPM throughput and workflow completeness.

Parameters:
  • experiment – The owning experiment object.

  • df_loading – Per-run loading DataFrame (unused here).

  • df_reduced – Per-phase execution DataFrame.

  • workflow_actual – Reconstructed actual workflow dict.

  • workflow_planned – Planned workflow dict from workload config.

test_results()

Validates results by reading all pickle files and delegating to the parent check.

Returns:

0 on success, 1 if an exception is raised.

Return type:

int

class bexhoma.evaluators.YcsbEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: LogEvaluator

Evaluator for a YCSB experiment.

Parses per-pod log files to extract operation counts, throughput, and per-operation latency statistics produced by the Yahoo Cloud Serving Benchmark (YCSB) tool. Provides time-series access to per-second throughput for both the benchmarking and loading phases via get_benchmark_logs_timeseries_df_aggregated(), get_loading_logs_timeseries_df_aggregated(), and their *_single variants.

Parameters:
  • code – Experiment identifier — also the name of the result sub-folder.

  • path – Root path that contains the result folders.

  • include_loading – Ignored; loading is always enabled for this evaluator.

  • include_benchmarking – Ignored; benchmarking is always enabled.

benchmark_logs_to_timeseries_df(list_logs, metric='current_ops_per_sec', aggregate=True, filetype='benchmarker')

Parses benchmarker log files for the given pod IDs and assembles a time-series DataFrame.

Delegates to logs_to_timeseries_df() with filetype='benchmarker'.

Parameters:
  • list_logs (list[str]) – Pod IDs used to locate matching log files.

  • metric (str) – Metric to extract (default 'current_ops_per_sec').

  • aggregate (bool) – Whether to aggregate all pod DataFrames into one.

Returns:

Aggregated DataFrame or list of per-pod DataFrames.

Return type:

pandas.DataFrame or list[pandas.DataFrame]

benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod YCSB benchmarking rows into one row per group.

Groups by columns and sums counts/throughput, takes mean for average latencies, and max for percentile/max latencies.

The phase column holds the phase identifier (configuration-experiment_run-client) and the job column holds the job identifier (configuration-experiment_run-client-benchmark_run).

The default columns=['phase'] groups by phase, producing one row per phase. To keep one row per job, pass columns=['job'].

Parameters:
  • df (pandas.DataFrame) – Typed YCSB benchmarking DataFrame.

  • columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

benchmarking_set_datatypes(df)

Casts all YCSB benchmarking result columns to their appropriate data types.

Only casts operation-specific columns when they are present in the DataFrame.

Parameters:

df (pandas.DataFrame) – DataFrame of raw YCSB benchmarking results.

Returns:

DataFrame with columns cast to correct types, or original df on error.

Return type:

pandas.DataFrame

get_benchmark_logs_timeseries_df_aggregated(metric='current_ops_per_sec', configuration='', client='1', experiment_run='1')

Returns a DataFrame of per-second benchmarking time-series, aggregated across pods.

Retrieves pod IDs from get_df_benchmarking() and delegates to benchmark_logs_to_timeseries_df() with aggregate=True.

Parameters:
  • metric (str) – YCSB metric to retrieve (default 'current_ops_per_sec').

  • configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-196608').

  • client (str or int) – Client number (default '1').

  • experiment_run (str or int) – Experiment run number (default '1').

Returns:

DataFrame indexed by second with one metric column and an 'avg' column.

Return type:

pandas.DataFrame

get_benchmark_logs_timeseries_df_single(metric='current_ops_per_sec', configuration='', client='1', experiment_run='1')

Returns a list of per-pod benchmarking time-series DataFrames (one per pod).

Parameters:
  • metric (str) – YCSB metric to retrieve (default 'current_ops_per_sec').

  • configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-196608').

  • client (str or int) – Client number (default '1').

  • experiment_run (str or int) – Experiment run number (default '1').

Returns:

List of DataFrames, one per pod, each indexed by second.

Return type:

list[pandas.DataFrame]

get_df_loading()

Returns the DataFrame containing all loading-phase results.

Returns:

DataFrame of loading results, or empty DataFrame when unavailable.

Return type:

pandas.DataFrame

get_loading_logs_timeseries_df_aggregated(metric='current_ops_per_sec', configuration='', experiment_run='1')

Returns a DataFrame of time series of a metric for the loading phase, aggregated over all pods per second.

Uses get_df_loading() to retrieve the pod list and benchmark_logs_to_timeseries_df() to parse and aggregate the log files. Restricts to a configuration and an experiment run. Aggregation follows the same strategy as for the benchmarking phase: percentiles and maximum by max, minimum by min, average by average, 'current_ops_per_sec' and all others by sum.

Parameters:
  • metric (str) – Metric to retrieve (default 'current_ops_per_sec').

  • configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-196608').

  • experiment_run (str or int) – Experiment run number (default '1').

Returns:

DataFrame indexed by second with one column for the aggregated metric plus an 'avg' column, or an empty DataFrame when no files are found.

Return type:

pandas.DataFrame

get_loading_logs_timeseries_df_single(metric='current_ops_per_sec', configuration='', experiment_run='1')

Returns a list of DataFrames of time series of a metric for the loading phase, one per pod.

Uses get_df_loading() to retrieve the pod list and benchmark_logs_to_timeseries_df() to parse the log files without aggregation. Restricts to a configuration and an experiment run.

Parameters:
  • metric (str) – Metric to retrieve (default 'current_ops_per_sec').

  • configuration (str) – Configuration name (e.g. 'PostgreSQL-64-8-196608').

  • experiment_run (str or int) – Experiment run number (default '1').

Returns:

List of DataFrames, one per pod, each indexed by second with one metric column.

Return type:

list[pandas.DataFrame]

get_loading_per_connection()

Returns loading metrics for each individual connection, merged with connection metadata and enriched with the scale factor.

Combines the aggregated loading DataFrame (from get_df_loading()) with connection metadata (from get_connections_of_experiment()) on (code, configuration, experiment_run), then normalises the index. Rows for which no loading log was recorded (missing pod_count) are dropped.

Returns:

DataFrame with one row per loading run, indexed as {code}-{configuration}-{experiment_run}.

Return type:

pandas.DataFrame

get_loading_per_pod()

Returns the raw loading DataFrame with one row per pod.

Returns:

DataFrame from get_df_loading() — one row per loading pod.

Return type:

pandas.DataFrame

get_loading_per_run()

Returns loading metrics aggregated per (code, configuration, experiment_run).

Overrides the base implementation to derive 'Throughput [SF/h]' from '[OVERALL].RunTime(ms)' rather than time_load, since YCSB loading results carry wall-clock run time in milliseconds rather than the bexhoma connection-level timing.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

get_summary_benchmark_per_connection()

Returns benchmarking results with one row per pod, filtered to the key display columns.

Applies benchmarking_set_datatypes() and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by (experiment_run, client, child).

Returns:

DataFrame indexed as "DBMS" with one row per pod, or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_benchmark_per_phase()

Returns benchmarking results aggregated over parallel pods, one row per phase.

Applies benchmarking_set_datatypes(), aggregates via benchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by (experiment_run, target, pod_count).

Returns:

DataFrame indexed as "DBMS" with one row per phase, or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_benchmark_per_phase_multitenant()

Returns YCSB benchmarking results aggregated per phase and tenant, one row per (phase, tenant_id).

Like get_summary_benchmark_per_phase() but groups by ['phase', 'tenant_id'] so each tenant appears as a separate row.

Returns:

DataFrame indexed as "DBMS" with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.

Return type:

pandas.DataFrame

get_summary_loading_per_connection()

Returns loading metrics aggregated per experiment run.

Delegates to get_df_loading() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

get_summary_loading_per_run()

Returns loading metrics aggregated per experiment run.

Delegates to get_df_loading() (defined in base), which reduces the per-connection loading DataFrame to one row per (code, configuration, experiment_run) and adds a 'Throughput [SF/h]' column.

Returns:

DataFrame with one row per experiment run.

Return type:

pandas.DataFrame

loading_aggregate_by_parallel_pods(df, columns=['phase'])

Aggregates parallel-pod YCSB loading rows into one row per job.

The phase column stores BEXHOMA_CONNECTION, which is the job identifier (configuration-experiment_run-client-benchmark_run). The default columns=['phase'] therefore groups by job identifier, producing one row per job. To aggregate per phase, pass columns=['configuration', 'experiment_run', 'client'].

Parameters:
  • df (pandas.DataFrame) – Typed YCSB loading DataFrame.

  • columns (list[str]) – Grouping columns (default ['phase']).

Returns:

Aggregated DataFrame with one row per group.

Return type:

pandas.DataFrame

loading_logs_to_timeseries_df(list_logs, metric='current_ops_per_sec', aggregate=True, filetype='benchmarker')

Parses loader log files for the given pod IDs and assembles a time-series DataFrame.

Delegates to logs_to_timeseries_df() with filetype='loading'.

Parameters:
  • list_logs (list[str]) – Pod IDs used to locate matching log files.

  • metric (str) – Metric to extract (default 'current_ops_per_sec').

  • aggregate (bool) – Whether to aggregate all pod DataFrames into one.

Returns:

Aggregated DataFrame or list of per-pod DataFrames.

Return type:

pandas.DataFrame or list[pandas.DataFrame]

loading_set_datatypes(df)

Casts all YCSB loading result columns to their appropriate data types.

Parameters:

df (pandas.DataFrame) – DataFrame of raw YCSB loading results.

Returns:

DataFrame with columns cast to correct types.

Return type:

pandas.DataFrame

log_to_df(filename)

Parses a YCSB pod log file into a single-row DataFrame.

Extracts connection metadata, benchmark parameters, and per-operation metrics (throughput, latency percentiles) from the YCSB summary output.

Parameters:

filename (str) – Absolute path to the YCSB log file.

Returns:

Single-row DataFrame of YCSB results, or empty on parse failure.

Return type:

pandas.DataFrame

logs_to_timeseries_df(list_logs, metric='current_ops_per_sec', aggregate=True, filetype='benchmarker')

Parses YCSB log files for the given pod IDs and assembles a time-series DataFrame.

Each pod ID in list_logs is resolved to matching log files via a glob pattern that uses filetype to distinguish benchmarker from loading logs. When aggregate is True the per-second values from all pods are combined: percentile/max metrics use element-wise maximum, minimum metrics use element-wise minimum, and all others are summed. When aggregate is False a list of per-pod DataFrames is returned instead.

Parameters:
  • list_logs (list[str]) – Pod IDs used to locate matching log files.

  • metric (str) – Metric column to extract (default 'current_ops_per_sec').

  • aggregate (bool) – Whether to aggregate all pod DataFrames into one.

  • filetype (str) – Log file prefix: 'benchmarker' or 'loading'.

Returns:

Aggregated DataFrame indexed by 'sec' (with an 'avg' column appended) when aggregate is True, or a list of per-pod DataFrames.

Return type:

pandas.DataFrame or list[pandas.DataFrame]

parse_ycsb_log_file(file_path)

Scans the lines of a YCSB log file. Extracts relevant performance infos for time series analysis. Each line starting with a time stamp is converted into a dict containing measurements (operations, sec of measurement, READ latency, …)-

Parameters:

file_path – Full path of log file

Returns:

List of dicts of measures, one entry per line

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None

Record YCSB pass/fail tests.

Tests overall throughput for the loading phase (when data is available) and the execution phase, workflow completeness, and absence of FAILED operation columns.

Parameters:
  • experiment – The owning experiment object.

  • df_loading – Per-run loading DataFrame; empty if loading was not active.

  • df_reduced – Per-phase execution DataFrame.

  • workflow_actual – Reconstructed actual workflow dict.

  • workflow_planned – Planned workflow dict from workload config.

bexhoma.evaluators.base

alias of EvaluatorBase

bexhoma.evaluators.benchbase

alias of BenchbaseEvaluator

bexhoma.evaluators.dbmsbenchmarker

alias of DbmsBenchmarkerEvaluator

bexhoma.evaluators.logger

alias of LogEvaluator

bexhoma.evaluators.natural_sort(items)

Sorts a list in natural (human) order so that embedded digit runs are compared numerically rather than lexicographically. Works for lists of strings, integers, or any mix whose elements have a meaningful str() representation.

Parameters:

items (list) – List to sort.

Returns:

Sorted list.

Return type:

list

bexhoma.evaluators.tpcc

alias of TpccEvaluator

bexhoma.evaluators.ycsb

alias of YcsbEvaluator