bexhoma.evaluators.base module

Base evaluator class for bexhoma experiments.

Provides natural_sort() and EvaluatorBase, which loads an experiment result folder, parses workflow state and connection configuration, and exposes monitoring data. All other evaluators inherit from EvaluatorBase via LogEvaluator.

class bexhoma.evaluators.base.EvaluatorBase(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: object

Base evaluator for a single bexhoma experiment.

Loads the experiment result folder identified by code inside path, provides helpers for scanning log files and reconstructing the workflow, and exposes connection/loading metadata. All benchmark-specific evaluators inherit from this class (via logger).

Parameters:

code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.
benchmark_run – 1-based position in the benchmark sequence; 0 means unset.

add_connection_to_result(c, connection_id, result)

Appends a flattened connection entry to result keyed by connection_id.

Extracts scalar fields from the connection config dict c — including host-system, loading-parameter, benchmarking-parameter, SUT-parameter, and args sections — and stores them with prefixed keys (host_*, loading_parameters_*, benchmarking_parameters_*, sut_parameters_*, arg_*). Non-scalar values (lists and dicts) are skipped.

Parameters:

c (dict) – Single connection entry from connections.config.
connection_id (str) – Key to use when inserting into result.
result (dict) – Accumulator dict that maps connection IDs to metadata rows.

end_benchmarking(jobname)

Processes all benchmarker log files for a given job name.

Scans the result folder for files matching bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.log and calls log_to_df() on each one.

Parameters:: jobname (str) – Job name used to filter matching log files.

end_loading(jobname)

Processes all loader sensor log files for a given job name.

Scans the result folder for files matching bexhoma-loading-<jobname>*.sensor.log and calls log_to_df() on each one.

Parameters:: jobname (str) – Job name used to filter matching log files.

evaluate_results(pod_dashboard=''): Scans all log files and grabs some information. In this class basically it scans for errors.

get_connections_of_experiment()

Returns connection metadata for a single experiment.

Reads connections.config and builds a row per pod/client with the following key columns: phase (code-prefixed phase identifier, <code>-<configuration>-<experiment_run>-<client>), job (code-prefixed job identifier, <code>-<configuration>-<experiment_run>-<client>-<benchmark_run>), code, connection, configuration, experiment_run, client, type_tenants, num_tenants, vol_tenants, plus flattened host-system, loading-parameter, benchmarking-parameter, and SUT-parameter fields.

When a connection entry carries orig_name, the entry represents an individual pod; otherwise a synthetic row is generated for each parallel client.

Returns:: DataFrame of connection metadata, one row per pod/client.
Return type:: pandas.DataFrame

get_df_benchmarking()

Returns the DataFrame containing all benchmarking-phase results.

Returns:: Empty DataFrame; overridden by subclasses.
Return type:: pandas.DataFrame

get_df_loading()

Returns the DataFrame containing all loading-phase results.

Returns:: Empty DataFrame; overridden by subclasses.
Return type:: pandas.DataFrame

get_loading_per_connection()

Returns loading metrics for each individual connection (pod/client), enriched with the scale factor and a 'Throughput [SF/h]' derived column.

Returns:: DataFrame with one row per connection.
Return type:: pandas.DataFrame

get_loading_per_run()

Returns loading metrics aggregated per (code, configuration, experiment_run).

Takes the per-connection DataFrame from get_loading_per_connection() and reduces it to one row per experiment run by taking the max across connections, then recomputes 'Throughput [SF/h]' from the aggregated load time.

Returns:: DataFrame with one row per experiment run.
Return type:: pandas.DataFrame

get_loading_per_run_multitenant()

Returns loading metrics aggregated per (code, experiment_run, type_tenants, vol_tenants, num_tenants, tenant_id) for multi-tenant experiments.

For container tenancy the tenant_id key distinguishes individual tenant loading times.

For schema/database tenancy, reads per-pod sensor log files via _get_tenant_loading_from_logs() to expand the single shared connection row into one row per tenant. Each row carries the tenant’s own loading duration (time_ingest / time_load) and a matching tenant_id. If the sensor logs are absent the result collapses to one row with tenant_id = '' (same as before).

Returns:: DataFrame with one row per tenant per experiment run.
Return type:: pandas.DataFrame

get_summary_loading_per_run_multitenant()

Returns loading metrics per tenant per experiment run, with housekeeping columns removed.

Wraps get_loading_per_run_multitenant() and drops code and configuration so the result is ready to display in show_summary().

Returns:: DataFrame with one row per (tenant_id, experiment_run) combination.
Return type:: pandas.DataFrame

get_workload()

Returns the workload configuration of an experiment.

Reads the queries.config file from the experiment result folder and returns its contents as a Python dictionary.

Returns:: Workload properties dictionary.
Return type:: dict

log_to_df(filename)

Scans a pod log file for known errors and records them in self.workflow_errors.

Returns an empty DataFrame; subclasses override this method to also parse benchmark results out of the log.

Parameters:: filename (str) – Absolute path to the log file.
Returns:: Empty DataFrame (subclasses return populated DataFrames).
Return type:: pandas.DataFrame

log_to_df_loading(filename: str) → DataFrame

Parse a loading pod log file and return the result as a DataFrame.

Default implementation delegates to log_to_df(), which is correct for benchmarks whose loading and benchmarking log formats are identical (e.g. YCSB). Subclasses where the formats differ must override this method.

Parameters:: filename (str) – Absolute path to the loading log file.
Returns:: DataFrame of loading results, or empty DataFrame on failure.
Return type:: pandas.DataFrame

reconstruct_workflow(df: DataFrame) → dict

Reconstructs the actual experiment workflow from connection metadata.

Reads benchmark_sequence from queries.config to map each benchmark_run index to its benchmarker type, then groups the DataFrame by (configuration, experiment_run, client, benchmark_run) to produce a structure that mirrors the planned workflow format:

{
    'MySQL-24-4-1024': [
        [  # experiment run 1
            [  # client round 1
                {'type': 'dbmsbenchmarker', 'pods': 4},
                {'type': 'tpch_refresh',    'pods': 1},
            ],
            [  # client round 2
                {'type': 'dbmsbenchmarker', 'pods': 8},
                {'type': 'tpch_refresh',    'pods': 1},
            ],
        ],
    ],
}

Parameters:: df (pandas.DataFrame) – Connection metadata DataFrame returned by get_connections_of_experiment(), with at least the columns configuration, experiment_run, client, benchmark_run, and pods.
Returns:: Workflow dict mapping configuration name to the nested structure.
Return type:: dict

test_results()

Validates results locally and returns an exit code.

Returns:: 0 on success; subclasses return 1 on failure.
Return type:: int

test_results_column(df, test_column: str) → bool

Check whether a column in a DataFrame contains any zero or NaN values.

Parameters:

df (pandas.DataFrame) – DataFrame to check.
test_column (str) – Column name to inspect.

Returns:

True if the column is fully populated with non-zero values, False otherwise.

Return type:

bool

transform_all_logs_benchmarking()

Iterates over all benchmarker log files and calls end_benchmarking() for each.

When self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index (last --separated component), so that each evaluator in a multi-benchmark experiment only ingests its own logs.

transform_all_logs_loading()

Iterates over all loader sensor log files and calls end_loading() for each.

When self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index, so that each evaluator only ingests its own loading logs.

bexhoma.evaluators.base.natural_sort(items)

Sorts a list in natural (human) order so that embedded digit runs are compared numerically rather than lexicographically. Works for lists of strings, integers, or any mix whose elements have a meaningful str() representation.

Parameters:: items (list) – List to sort.
Returns:: Sorted list.
Return type:: list