bexhoma.evaluators.base module
Base evaluator class for bexhoma experiments.
Provides natural_sort() and EvaluatorBase, which loads an experiment
result folder, parses workflow state and connection configuration, and
exposes monitoring data. All other evaluators inherit from EvaluatorBase
via LogEvaluator.
Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.
- class bexhoma.evaluators.base.EvaluatorBase(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
objectBase evaluator for a single bexhoma experiment.
Loads the experiment result folder identified by
codeinsidepath, provides helpers for scanning log files and reconstructing the workflow, and exposes connection/loading metadata. All benchmark-specific evaluators inherit from this class (vialogger).- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.
benchmark_run – 1-based position in the benchmark sequence; 0 means unset.
- add_connection_to_result(c, connection_id, result)
Appends a flattened connection entry to
resultkeyed byconnection_id.Extracts scalar fields from the connection config dict
c— including host-system, loading-parameter, benchmarking-parameter, SUT-parameter, andargssections — and stores them with prefixed keys (host_*,loading_parameters_*,benchmarking_parameters_*,sut_parameters_*,arg_*). Non-scalar values (lists and dicts) are skipped.
- end_benchmarking(jobname)
Processes all benchmarker log files for a given job name.
Scans the result folder for files matching
bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.logand callslog_to_df()on each one.- Parameters:
jobname (str) – Job name used to filter matching log files.
- end_loading(jobname)
Processes all loader sensor log files for a given job name.
Scans the result folder for files matching
bexhoma-loading-<jobname>*.sensor.logand callslog_to_df()on each one.- Parameters:
jobname (str) – Job name used to filter matching log files.
- evaluate_results(pod_dashboard='')
Scans all log files and grabs some information. In this class basically it scans for errors.
- get_connections_of_experiment()
Returns connection metadata for a single experiment.
Reads
connections.configand builds a row per pod/client with the following key columns:phase(code-prefixed phase identifier,<code>-<configuration>-<experiment_run>-<client>),job(code-prefixed job identifier,<code>-<configuration>-<experiment_run>-<client>-<benchmark_run>),code,connection,configuration,experiment_run,client,type_tenants,num_tenants,vol_tenants, plus flattened host-system, loading-parameter, benchmarking-parameter, and SUT-parameter fields.When a connection entry carries
orig_name, the entry represents an individual pod; otherwise a synthetic row is generated for each parallel client.- Returns:
DataFrame of connection metadata, one row per pod/client.
- Return type:
pandas.DataFrame
- get_df_benchmarking()
Returns the DataFrame containing all benchmarking-phase results.
- Returns:
Empty DataFrame; overridden by subclasses.
- Return type:
pandas.DataFrame
- get_df_loading()
Returns the DataFrame containing all loading-phase results.
- Returns:
Empty DataFrame; overridden by subclasses.
- Return type:
pandas.DataFrame
- get_loading_per_connection()
Returns loading metrics for each individual connection (pod/client), enriched with the scale factor and a
'Throughput [SF/h]'derived column.- Returns:
DataFrame with one row per connection.
- Return type:
pandas.DataFrame
- get_loading_per_run()
Returns loading metrics aggregated per
(code, configuration, experiment_run).Takes the per-connection DataFrame from
get_loading_per_connection()and reduces it to one row per experiment run by taking the max across connections, then recomputes'Throughput [SF/h]'from the aggregated load time.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- get_loading_per_run_multitenant()
Returns loading metrics aggregated per
(code, experiment_run, type_tenants, vol_tenants, num_tenants, tenant_id)for multi-tenant experiments.For container tenancy the
tenant_idkey distinguishes individual tenant loading times.For schema/database tenancy, reads per-pod sensor log files via
_get_tenant_loading_from_logs()to expand the single shared connection row into one row per tenant. Each row carries the tenant’s own loading duration (time_ingest/time_load) and a matchingtenant_id. If the sensor logs are absent the result collapses to one row withtenant_id = ''(same as before).- Returns:
DataFrame with one row per tenant per experiment run.
- Return type:
pandas.DataFrame
- get_summary_loading_per_run_multitenant()
Returns loading metrics per tenant per experiment run, with housekeeping columns removed.
Wraps
get_loading_per_run_multitenant()and dropscodeandconfigurationso the result is ready to display inshow_summary().- Returns:
DataFrame with one row per
(tenant_id, experiment_run)combination.- Return type:
pandas.DataFrame
- get_workload()
Returns the workload configuration of an experiment.
Reads the
queries.configfile from the experiment result folder and returns its contents as a Python dictionary.- Returns:
Workload properties dictionary.
- Return type:
- log_to_df(filename)
Scans a pod log file for known errors and records them in
self.workflow_errors.Returns an empty DataFrame; subclasses override this method to also parse benchmark results out of the log.
- Parameters:
filename (str) – Absolute path to the log file.
- Returns:
Empty DataFrame (subclasses return populated DataFrames).
- Return type:
pandas.DataFrame
- log_to_df_loading(filename: str) DataFrame
Parse a loading pod log file and return the result as a DataFrame.
Default implementation delegates to
log_to_df(), which is correct for benchmarks whose loading and benchmarking log formats are identical (e.g. YCSB). Subclasses where the formats differ must override this method.- Parameters:
filename (str) – Absolute path to the loading log file.
- Returns:
DataFrame of loading results, or empty DataFrame on failure.
- Return type:
pandas.DataFrame
- reconstruct_workflow(df: DataFrame) dict
Reconstructs the actual experiment workflow from connection metadata.
Reads
benchmark_sequencefromqueries.configto map eachbenchmark_runindex to its benchmarker type, then groups the DataFrame by(configuration, experiment_run, client, benchmark_run)to produce a structure that mirrors the planned workflow format:{ 'MySQL-24-4-1024': [ [ # experiment run 1 [ # client round 1 {'type': 'dbmsbenchmarker', 'pods': 4}, {'type': 'tpch_refresh', 'pods': 1}, ], [ # client round 2 {'type': 'dbmsbenchmarker', 'pods': 8}, {'type': 'tpch_refresh', 'pods': 1}, ], ], ], }
- Parameters:
df (pandas.DataFrame) – Connection metadata DataFrame returned by
get_connections_of_experiment(), with at least the columnsconfiguration,experiment_run,client,benchmark_run, andpods.- Returns:
Workflow dict mapping configuration name to the nested structure.
- Return type:
- test_results()
Validates results locally and returns an exit code.
- Returns:
0on success; subclasses return1on failure.- Return type:
- test_results_column(df, test_column: str) bool
Check whether a column in a DataFrame contains any zero or NaN values.
- transform_all_logs_benchmarking()
Iterates over all benchmarker log files and calls
end_benchmarking()for each.When
self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index (last--separated component), so that each evaluator in a multi-benchmark experiment only ingests its own logs.
- transform_all_logs_loading()
Iterates over all loader sensor log files and calls
end_loading()for each.When
self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index, so that each evaluator only ingests its own loading logs.
- bexhoma.evaluators.base.natural_sort(items)
Sorts a list in natural (human) order so that embedded digit runs are compared numerically rather than lexicographically. Works for lists of strings, integers, or any mix whose elements have a meaningful
str()representation.