bexhoma.evaluators package
Submodules
- bexhoma.evaluators.base module
EvaluatorBaseEvaluatorBase.add_connection_to_result()EvaluatorBase.end_benchmarking()EvaluatorBase.end_loading()EvaluatorBase.evaluate_results()EvaluatorBase.get_connections_of_experiment()EvaluatorBase.get_df_benchmarking()EvaluatorBase.get_df_loading()EvaluatorBase.get_loading_per_connection()EvaluatorBase.get_loading_per_run()EvaluatorBase.get_loading_per_run_multitenant()EvaluatorBase.get_summary_loading_per_run_multitenant()EvaluatorBase.get_workload()EvaluatorBase.log_to_df()EvaluatorBase.log_to_df_loading()EvaluatorBase.reconstruct_workflow()EvaluatorBase.test_results()EvaluatorBase.test_results_column()EvaluatorBase.transform_all_logs_benchmarking()EvaluatorBase.transform_all_logs_loading()
natural_sort()
- bexhoma.evaluators.benchbase module
BenchbaseEvaluatorBenchbaseEvaluator.benchmark_logs_to_timeseries_df()BenchbaseEvaluator.benchmarking_aggregate_by_parallel_pods()BenchbaseEvaluator.benchmarking_set_datatypes()BenchbaseEvaluator.get_benchmark_logs_timeseries_df_aggregated()BenchbaseEvaluator.get_benchmark_logs_timeseries_df_single()BenchbaseEvaluator.get_summary_benchmark_per_connection()BenchbaseEvaluator.get_summary_benchmark_per_phase()BenchbaseEvaluator.get_summary_benchmark_per_phase_multitenant()BenchbaseEvaluator.get_summary_loading_per_run()BenchbaseEvaluator.log_to_df()BenchbaseEvaluator.parse_benchbase_log_file()BenchbaseEvaluator.record_tests()
- bexhoma.evaluators.dbmsbenchmarker module
DbmsBenchmarkerEvaluatorDbmsBenchmarkerEvaluator.benchmarking_aggregate_by_parallel_pods()DbmsBenchmarkerEvaluator.benchmarking_set_datatypes()DbmsBenchmarkerEvaluator.get_df_benchmarking()DbmsBenchmarkerEvaluator.get_df_loading()DbmsBenchmarkerEvaluator.get_query_latencies()DbmsBenchmarkerEvaluator.get_summary_benchmark_per_connection()DbmsBenchmarkerEvaluator.get_summary_benchmark_per_phase()DbmsBenchmarkerEvaluator.get_summary_benchmark_per_phase_multitenant()DbmsBenchmarkerEvaluator.get_summary_loading_per_run()DbmsBenchmarkerEvaluator.get_total_errors()DbmsBenchmarkerEvaluator.get_total_warnings()DbmsBenchmarkerEvaluator.load_inspector()DbmsBenchmarkerEvaluator.record_tests()DbmsBenchmarkerEvaluator.test_results()
map_index_to_queryname()
- bexhoma.evaluators.logger module
LogEvaluatorLogEvaluator.end_benchmarking()LogEvaluator.end_loading()LogEvaluator.evaluate_results()LogEvaluator.get_connection_config()LogEvaluator.get_df_benchmarking()LogEvaluator.get_df_loading()LogEvaluator.get_monitoring_metric()LogEvaluator.get_monitoring_metrics()LogEvaluator.plot()LogEvaluator.record_tests()LogEvaluator.test_results()LogEvaluator.transform_monitoring_results()
- bexhoma.evaluators.tpcc module
TpccEvaluatorTpccEvaluator.benchmarking_aggregate_by_parallel_pods()TpccEvaluator.benchmarking_set_datatypes()TpccEvaluator.get_summary_benchmark_per_connection()TpccEvaluator.get_summary_benchmark_per_phase()TpccEvaluator.get_summary_benchmark_per_phase_multitenant()TpccEvaluator.get_summary_loading_per_run()TpccEvaluator.log_to_df()TpccEvaluator.record_tests()TpccEvaluator.test_results()
- bexhoma.evaluators.ycsb module
YcsbEvaluatorYcsbEvaluator.benchmark_logs_to_timeseries_df()YcsbEvaluator.benchmarking_aggregate_by_parallel_pods()YcsbEvaluator.benchmarking_set_datatypes()YcsbEvaluator.get_benchmark_logs_timeseries_df_aggregated()YcsbEvaluator.get_benchmark_logs_timeseries_df_single()YcsbEvaluator.get_df_loading()YcsbEvaluator.get_loading_logs_timeseries_df_aggregated()YcsbEvaluator.get_loading_logs_timeseries_df_single()YcsbEvaluator.get_loading_per_connection()YcsbEvaluator.get_loading_per_pod()YcsbEvaluator.get_loading_per_run()YcsbEvaluator.get_summary_benchmark_per_connection()YcsbEvaluator.get_summary_benchmark_per_phase()YcsbEvaluator.get_summary_benchmark_per_phase_multitenant()YcsbEvaluator.get_summary_loading_per_connection()YcsbEvaluator.get_summary_loading_per_run()YcsbEvaluator.loading_aggregate_by_parallel_pods()YcsbEvaluator.loading_logs_to_timeseries_df()YcsbEvaluator.loading_set_datatypes()YcsbEvaluator.log_to_df()YcsbEvaluator.logs_to_timeseries_df()YcsbEvaluator.parse_ycsb_log_file()YcsbEvaluator.record_tests()
Module contents
Public API of the bexhoma.evaluators package.
Exports EvaluatorBase, LogEvaluator,
DbmsBenchmarkerEvaluator, BenchbaseEvaluator,
TpccEvaluator, and YcsbEvaluator evaluator classes,
plus the natural_sort() utility from base.
Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.
- class bexhoma.evaluators.BenchbaseEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
LogEvaluatorEvaluator for a Benchbase experiment.
Parses per-pod log files to extract throughput, goodput, and latency distribution results produced by the Benchbase benchmarking tool. Also provides time-series access to per-second throughput metrics via
get_benchmark_logs_timeseries_df_aggregated()andget_benchmark_logs_timeseries_df_single().- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.
- benchmark_logs_to_timeseries_df(list_logs, metric='throughput', aggregate=True)
Parses Benchbase log files for the given pod IDs and assembles a time-series DataFrame.
Each pod ID in
list_logsis resolved to matching log files via a glob pattern. WhenaggregateisTruethe per-second metric values from all pods are combined into a single DataFrame: percentile/max metrics use the element-wise maximum, minimum metrics use the element-wise minimum, and all others are summed. WhenaggregateisFalsea list of per-pod DataFrames is returned instead.- Parameters:
- Returns:
Aggregated DataFrame indexed by
'second'(with an'avg'column appended) whenaggregateisTrue, or a list of per-pod DataFrames.- Return type:
pandas.DataFrame or list[pandas.DataFrame]
- benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])
Aggregates parallel-pod result rows into one row per group.
Groups the typed benchmarking DataFrame by
columnsand applies per-metric aggregation functions (sum for throughput, max for latency percentiles, etc.).The
phasecolumn holds the phase identifier (configuration-experiment_run-client) and thejobcolumn holds the job identifier (configuration-experiment_run-client-benchmark_run).The default
columns=['phase']groups by phase, producing one row per phase (all jobs within a phase merged). To keep one row per job, passcolumns=['job'].
- benchmarking_set_datatypes(df)
Casts all benchmarking result columns to their appropriate data types.
Adds a
tenant_idcolumn (value-1) when the column is absent so that DataFrames loaded from older pickles remain compatible.- Parameters:
df (pandas.DataFrame) – DataFrame of raw benchmarking results.
- Returns:
DataFrame with columns cast to correct types.
- Return type:
pandas.DataFrame
- get_benchmark_logs_timeseries_df_aggregated(metric='throughput', configuration='', client='1', experiment_run='1')
Returns a DataFrame of time series of a metric for the benchmarking phase, aggregated over all pods per second.
Retrieves pod IDs from
get_df_benchmarking()filtered by the givenconfiguration,client, andexperiment_run, then delegates tobenchmark_logs_to_timeseries_df()withaggregate=True.- Parameters:
- Returns:
DataFrame indexed by
'second'with the metric and an'avg'column.- Return type:
pandas.DataFrame
- get_benchmark_logs_timeseries_df_single(metric='throughput', configuration='', client='1', experiment_run='1')
Returns a list of DataFrames of time series of a metric for the benchmarking phase, one per pod.
Retrieves pod IDs from
get_df_benchmarking()filtered by the givenconfiguration,client, andexperiment_run, then delegates tobenchmark_logs_to_timeseries_df()withaggregate=False.- Parameters:
- Returns:
List of DataFrames, one per pod, each indexed by
'second'.- Return type:
list[pandas.DataFrame]
- get_summary_benchmark_per_connection()
Returns benchmarking results with one row per pod, filtered to the key display columns.
Applies
benchmarking_set_datatypes()and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by(experiment_run, client, child).- Returns:
DataFrame indexed as
"DBMS"with one row per pod, orNoneif there are no benchmarking results.- Return type:
pandas.DataFrame or None
- get_summary_benchmark_per_phase()
Returns benchmarking results aggregated over parallel pods, one row per phase.
Applies
benchmarking_set_datatypes(), aggregates viabenchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (phase, experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by(experiment_run, target, pod_count).- Returns:
DataFrame indexed as
"DBMS"with one row per phase, or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_benchmark_per_phase_multitenant()
Returns benchmarking results aggregated per phase and tenant, one row per
(phase, tenant_id).Like
get_summary_benchmark_per_phase()but groups by['phase', 'tenant_id']so each tenant appears as a separate row.- Returns:
DataFrame indexed as
"DBMS"with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_loading_per_run()
Returns loading metrics aggregated per experiment run.
Delegates to
get_loading_per_run()(defined inbase), which reduces the per-connection loading DataFrame to one row per(code, configuration, experiment_run)and adds a'Throughput [SF/h]'column.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- log_to_df(filename)
Parses a Benchbase pod log file into a single-row DataFrame.
Extracts connection metadata (including
tenant_idfrom theBEXHOMA_TENANT_IDstdout line), benchmark parameters, and the JSON result block embedded between####BEXHOMA####markers. Returns an empty DataFrame when the log is incomplete (e.g. the start time has already passed).- Parameters:
filename (str) – Absolute path to the log file.
- Returns:
Single-row DataFrame of benchmarking results, or empty on failure.
- Return type:
pandas.DataFrame
- parse_benchbase_log_file(file_path)
Parses a Benchbase log file into a list of per-second throughput records.
Each
[INFO]log line that contains aThroughput:entry is converted into a dict with keyssecond(elapsed time) andthroughput.
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record Benchbase pass/fail tests: throughput and workflow completeness.
- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame (unused here).
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
- class bexhoma.evaluators.DbmsBenchmarkerEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
LogEvaluatorEvaluator for a DBMSBenchmarker experiment.
Wraps a
dbmsbenchmarker.inspector.inspectorinstance and exposes loading times, per-query latency statistics, throughput metrics, warning and error counts, and aggregation over parallel pods.- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Unused; loading is always enabled for this evaluator.
include_benchmarking – Unused; benchmarking is always enabled.
- benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])
Aggregates parallel-pod DBMSBenchmarker result rows into one row per group.
Groups by
columnsand applies geo-mean for timing/power metrics and max/sum for count metrics. RecomputesThroughput@Sizefrom the aggregated values.The
phasecolumn holds the phase identifier (configuration-experiment_run-client) and thejobcolumn holds the job identifier (configuration-experiment_run-client-benchmark_run).The default
columns=['phase']groups by phase, producing one row per phase. To keep one row per job, passcolumns=['job'].
- benchmarking_set_datatypes(df)
Returns the DataFrame, adding a
tenant_idcolumn (value-1) when the column is absent so that DataFrames loaded from older pickles remain compatible with aggregation code that expects the column.DBMSBenchmarker results are otherwise already typed by the inspector; no other conversion is needed.
- Parameters:
df (pandas.DataFrame) – DataFrame of results.
- Returns:
DataFrame with
tenant_idguaranteed to be present.- Return type:
pandas.DataFrame
- get_df_benchmarking()
Returns the DataFrame containing all benchmarking-phase results.
Combines per-query latency statistics, geo-mean execution times, and per-connection timing data from the DBMSBenchmarker inspector into a single DataFrame. Includes
tenant_idread from theBEXHOMA_TENANT_IDloading parameter (-1when absent).- Returns:
DataFrame with one row per connection/pod, or empty DataFrame on failure.
- Return type:
pandas.DataFrame
- get_df_loading()
Returns the DataFrame containing all loading-phase timing results.
Reads loading time fields (
timeGenerate,timeIngesting,timeSchema,timeIndex,timeLoad) from the inspector’s connection data.- Returns:
DataFrame with one row per DBMS connection indexed as
"DBMS".- Return type:
pandas.DataFrame
- get_query_latencies(query_titles=False)
Returns the mean execution latency per query and DBMS.
- Parameters:
query_titles (bool) – When
True, replaces query index labels with human-readable titles fromqueries.config.- Returns:
DataFrame of mean latencies (ms) with queries as columns and DBMS as rows.
- Return type:
pandas.DataFrame
- get_summary_benchmark_per_connection()
Returns benchmarking results with one row per pod, filtered to the key display columns.
Applies
benchmarking_set_datatypes()and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by(experiment_run, client, child).- Returns:
DataFrame indexed as
"DBMS"with one row per pod, orNoneif there are no benchmarking results.- Return type:
pandas.DataFrame or None
- get_summary_benchmark_per_phase()
Returns benchmarking results aggregated over parallel pods, one row per phase.
Applies
benchmarking_set_datatypes(), aggregates viabenchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by(experiment_run, target, pod_count).- Returns:
DataFrame indexed as
"DBMS"with one row per phase, or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_benchmark_per_phase_multitenant()
Returns benchmarking results aggregated per phase and tenant, one row per
(phase, tenant_id).Like
get_summary_benchmark_per_phase()but groups by['phase', 'tenant_id']so each tenant appears as a separate row.- Returns:
DataFrame indexed as
"DBMS"with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_loading_per_run()
Returns loading metrics aggregated per experiment run.
Delegates to
get_loading_per_run()(defined inbase), which reduces the per-connection loading DataFrame to one row per(code, configuration, experiment_run)and adds a'Throughput [SF/h]'column.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- get_total_errors(query_titles=False)
Returns the per-query error counts for this experiment.
- Parameters:
query_titles (bool) – When
True, replaces query index labels with human-readable titles fromqueries.config.- Returns:
DataFrame of error counts with queries as columns and DBMS as rows.
- Return type:
pandas.DataFrame
- get_total_warnings(query_titles=False)
Returns the per-query warning counts for this experiment.
- Parameters:
query_titles (bool) – When
True, replaces query index labels with human-readable titles fromqueries.config.- Returns:
DataFrame of warning counts with queries as columns and DBMS as rows.
- Return type:
pandas.DataFrame
- load_inspector()
Loads the DBMSBenchmarker inspector for this experiment.
Creates an
inspector.inspectorrooted atself.path_base, loads the experiment identified byself.code, and stores the result inself.evaluation. Setsself.evaluationtoNoneif loading fails so callers can detect the uninitialized state.
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record DBMSBenchmarker pass/fail tests.
Tests query metric columns (Geo Times, Power@Size, Throughput@Size), SQL error and warning counts supplied by
_show_extra_sections, and workflow completeness.- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame (unused here).
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
extra – Must contain
num_errorsandnum_warningsfrom_show_extra_sections().
- test_results()
Validates results by loading and reconstructing the workflow.
- Returns:
0on success,1if an exception is raised.- Return type:
- class bexhoma.evaluators.EvaluatorBase(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
objectBase evaluator for a single bexhoma experiment.
Loads the experiment result folder identified by
codeinsidepath, provides helpers for scanning log files and reconstructing the workflow, and exposes connection/loading metadata. All benchmark-specific evaluators inherit from this class (vialogger).- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.
benchmark_run – 1-based position in the benchmark sequence; 0 means unset.
- add_connection_to_result(c, connection_id, result)
Appends a flattened connection entry to
resultkeyed byconnection_id.Extracts scalar fields from the connection config dict
c— including host-system, loading-parameter, benchmarking-parameter, SUT-parameter, andargssections — and stores them with prefixed keys (host_*,loading_parameters_*,benchmarking_parameters_*,sut_parameters_*,arg_*). Non-scalar values (lists and dicts) are skipped.
- end_benchmarking(jobname)
Processes all benchmarker log files for a given job name.
Scans the result folder for files matching
bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.logand callslog_to_df()on each one.- Parameters:
jobname (str) – Job name used to filter matching log files.
- end_loading(jobname)
Processes all loader sensor log files for a given job name.
Scans the result folder for files matching
bexhoma-loading-<jobname>*.sensor.logand callslog_to_df()on each one.- Parameters:
jobname (str) – Job name used to filter matching log files.
- evaluate_results(pod_dashboard='')
Scans all log files and grabs some information. In this class basically it scans for errors.
- get_connections_of_experiment()
Returns connection metadata for a single experiment.
Reads
connections.configand builds a row per pod/client with the following key columns:phase(code-prefixed phase identifier,<code>-<configuration>-<experiment_run>-<client>),job(code-prefixed job identifier,<code>-<configuration>-<experiment_run>-<client>-<benchmark_run>),code,connection,configuration,experiment_run,client,type_tenants,num_tenants,vol_tenants, plus flattened host-system, loading-parameter, benchmarking-parameter, and SUT-parameter fields.When a connection entry carries
orig_name, the entry represents an individual pod; otherwise a synthetic row is generated for each parallel client.- Returns:
DataFrame of connection metadata, one row per pod/client.
- Return type:
pandas.DataFrame
- get_df_benchmarking()
Returns the DataFrame containing all benchmarking-phase results.
- Returns:
Empty DataFrame; overridden by subclasses.
- Return type:
pandas.DataFrame
- get_df_loading()
Returns the DataFrame containing all loading-phase results.
- Returns:
Empty DataFrame; overridden by subclasses.
- Return type:
pandas.DataFrame
- get_loading_per_connection()
Returns loading metrics for each individual connection (pod/client), enriched with the scale factor and a
'Throughput [SF/h]'derived column.- Returns:
DataFrame with one row per connection.
- Return type:
pandas.DataFrame
- get_loading_per_run()
Returns loading metrics aggregated per
(code, configuration, experiment_run).Takes the per-connection DataFrame from
get_loading_per_connection()and reduces it to one row per experiment run by taking the max across connections, then recomputes'Throughput [SF/h]'from the aggregated load time.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- get_loading_per_run_multitenant()
Returns loading metrics aggregated per
(code, experiment_run, type_tenants, vol_tenants, num_tenants, tenant_id)for multi-tenant experiments.For container tenancy the
tenant_idkey distinguishes individual tenant loading times.For schema/database tenancy, reads per-pod sensor log files via
_get_tenant_loading_from_logs()to expand the single shared connection row into one row per tenant. Each row carries the tenant’s own loading duration (time_ingest/time_load) and a matchingtenant_id. If the sensor logs are absent the result collapses to one row withtenant_id = ''(same as before).- Returns:
DataFrame with one row per tenant per experiment run.
- Return type:
pandas.DataFrame
- get_summary_loading_per_run_multitenant()
Returns loading metrics per tenant per experiment run, with housekeeping columns removed.
Wraps
get_loading_per_run_multitenant()and dropscodeandconfigurationso the result is ready to display inshow_summary().- Returns:
DataFrame with one row per
(tenant_id, experiment_run)combination.- Return type:
pandas.DataFrame
- get_workload()
Returns the workload configuration of an experiment.
Reads the
queries.configfile from the experiment result folder and returns its contents as a Python dictionary.- Returns:
Workload properties dictionary.
- Return type:
- log_to_df(filename)
Scans a pod log file for known errors and records them in
self.workflow_errors.Returns an empty DataFrame; subclasses override this method to also parse benchmark results out of the log.
- Parameters:
filename (str) – Absolute path to the log file.
- Returns:
Empty DataFrame (subclasses return populated DataFrames).
- Return type:
pandas.DataFrame
- log_to_df_loading(filename: str) DataFrame
Parse a loading pod log file and return the result as a DataFrame.
Default implementation delegates to
log_to_df(), which is correct for benchmarks whose loading and benchmarking log formats are identical (e.g. YCSB). Subclasses where the formats differ must override this method.- Parameters:
filename (str) – Absolute path to the loading log file.
- Returns:
DataFrame of loading results, or empty DataFrame on failure.
- Return type:
pandas.DataFrame
- reconstruct_workflow(df: DataFrame) dict
Reconstructs the actual experiment workflow from connection metadata.
Reads
benchmark_sequencefromqueries.configto map eachbenchmark_runindex to its benchmarker type, then groups the DataFrame by(configuration, experiment_run, client, benchmark_run)to produce a structure that mirrors the planned workflow format:{ 'MySQL-24-4-1024': [ [ # experiment run 1 [ # client round 1 {'type': 'dbmsbenchmarker', 'pods': 4}, {'type': 'tpch_refresh', 'pods': 1}, ], [ # client round 2 {'type': 'dbmsbenchmarker', 'pods': 8}, {'type': 'tpch_refresh', 'pods': 1}, ], ], ], }
- Parameters:
df (pandas.DataFrame) – Connection metadata DataFrame returned by
get_connections_of_experiment(), with at least the columnsconfiguration,experiment_run,client,benchmark_run, andpods.- Returns:
Workflow dict mapping configuration name to the nested structure.
- Return type:
- test_results()
Validates results locally and returns an exit code.
- Returns:
0on success; subclasses return1on failure.- Return type:
- test_results_column(df, test_column: str) bool
Check whether a column in a DataFrame contains any zero or NaN values.
- transform_all_logs_benchmarking()
Iterates over all benchmarker log files and calls
end_benchmarking()for each.When
self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index (last--separated component), so that each evaluator in a multi-benchmark experiment only ingests its own logs.
- transform_all_logs_loading()
Iterates over all loader sensor log files and calls
end_loading()for each.When
self.benchmark_run > 0, only processes log files whose jobname ends with the matching benchmark index, so that each evaluator only ingests its own loading logs.
- class bexhoma.evaluators.LogEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
EvaluatorBaseEvaluator base that reads benchmark log files into DataFrames.
Extends
baseby implementingend_benchmarking()andend_loading()to parse pod log files, pickle the resulting DataFrames, and collect them into a single combined pickle per phase. All benchmark-specific evaluators (benchbase,ycsb,tpcc,dbmsbenchmarker) inherit from this class.- end_benchmarking(jobname)
Parses all benchmarker log files for a job and caches results as pickle files.
Scans the result folder for files matching
bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.log, callslog_to_df()on each, and writes non-empty results to a<filename>.df.pickleside-car file.- Parameters:
jobname (str) – Job name used to filter matching log files.
- end_loading(jobname)
Parses all loader sensor log files for a job and caches results as pickle files.
Scans the result folder for files matching
bexhoma-loading-<jobname>*.sensor.log, callslog_to_df()on each, prints a message when errors are detected, and writes non-empty results to a<filename>.df.pickleside-car file.- Parameters:
jobname (str) – Job name used to filter matching log files.
- evaluate_results(pod_dashboard='')
Parses all pod log files and persists the results as pickled DataFrames.
Calls
transform_all_logs_benchmarking()and_collect_dfs()for the benchmarking phase wheninclude_benchmarkingis set, and analogously for the loading phase. Whenbenchmark_run > 0, each phase writes a per-benchmark pickle file rather than the shared*.all.df.pickle.
- get_connection_config()
Returns the parsed
connections.configas a list of connection dicts, sorted by connection name.
- get_df_benchmarking()
Returns the DataFrame containing all benchmarking-phase results.
Reads from the combined pickle file, triggering
evaluate_results()to generate it on first access if it does not yet exist.Ensures a
podscolumn is present: when the pickle was written before this column was added (older runs), it is derived frompod_count.- Returns:
DataFrame of benchmarking results, or empty DataFrame when unavailable.
- Return type:
pandas.DataFrame
- get_df_loading()
Returns the DataFrame containing all loading-phase results.
Reads from the combined pickle file if it exists.
- Returns:
DataFrame of loading results, or empty DataFrame when unavailable.
- Return type:
pandas.DataFrame
- get_monitoring_metric(metric, component='loading')
Returns a wide-format DataFrame of a single monitoring metric for a component.
Reads the pre-combined CSV produced by
transform_monitoring_results()and returns it transposed so that rows are timestamps and columns are connections.
- get_monitoring_metrics()
Returns the list of metric keys defined in the first connection’s monitoring block.
- plot(df, column, x, y, plot_by=None, kind='line', dict_colors=None, figsize=(12, 8))
Plots one or more line (or other) charts from a DataFrame.
When
plot_byisNone, a single chart is produced with one line per value incolumn. Whenplot_byis given, a grid of sub-plots is created — one per group defined byplot_by— with lines split bycolumnwithin each sub-plot.- Parameters:
df (pandas.DataFrame) – DataFrame containing the data to plot.
column (str) – Column whose unique values define individual lines.
x (str) – Column to use as the x-axis.
y (str) – Column to use as the y-axis.
plot_by (str or None) – Optional column whose values define separate sub-plots.
kind (str) – Plot kind passed to
DataFrame.plot(e.g.'line','bar').dict_colors (dict or None) – Optional colour mapping for the
kindkeyword.figsize (tuple) – Figure size as
(width, height)in inches.
- Returns:
Matplotlib axes object (single axes when
plot_byisNone).
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record pass/fail test results for this benchmark.
Default: tests only that the workflow matches the plan. Override in benchmark-specific evaluator subclasses to test metric columns.
- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame.
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
- test_results()
Validates results by loading and reconstructing the workflow.
- Returns:
0on success,1if an exception is raised.- Return type:
- transform_monitoring_results(component='loading')
Combines per-connection monitoring CSV files into a single wide-format CSV.
For example, per-connection files like:
query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-1.csv query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-2.csv
are merged into:
query_datagenerator_metric_total_cpu_util.csv
- Parameters:
component (str) – Component label used in the metric filename prefix (e.g.
'loading','stream').
- class bexhoma.evaluators.TpccEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
LogEvaluatorEvaluator for a HammerDB TPC-C experiment.
Parses per-pod log files to extract NOPM, TPM, and optional latency statistics (CALLS, MIN, AVG, MAX, TOTAL, P99, P95, P50, SD, RATIO) and assembles them into DataFrames. Aggregation over parallel pods follows the same pattern as the other logger-based evaluators.
- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Whether loading-phase results are expected.
include_benchmarking – Whether benchmarking-phase results are expected.
- benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])
Aggregates parallel-pod TPC-C result rows into one row per group.
Groups by
columnsand applies per-metric aggregation (NOPM/TPM averaged across pods, max for latency percentiles, etc.). Also recomputes efficiency for runs where vusers equal 10× the scale factor.The
phasecolumn holds the phase identifier (configuration-experiment_run-client) and thejobcolumn holds the job identifier (configuration-experiment_run-client-benchmark_run).The default
columns=['phase']groups by phase, producing one row per phase. To keep one row per job, passcolumns=['job'].
- benchmarking_set_datatypes(df)
Casts all TPC-C benchmarking result columns to their appropriate data types.
Handles two variants: with latency statistics (
CALLSpresent) and without.- Parameters:
df (pandas.DataFrame) – DataFrame of raw TPC-C benchmarking results.
- Returns:
DataFrame with columns cast to correct types.
- Return type:
pandas.DataFrame
- get_summary_benchmark_per_connection()
Returns benchmarking results with one row per pod, filtered to the key display columns.
Applies
benchmarking_set_datatypes()and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by(experiment_run, client, child).- Returns:
DataFrame indexed as
"DBMS"with one row per pod, orNoneif there are no benchmarking results.- Return type:
pandas.DataFrame or None
- get_summary_benchmark_per_phase()
Returns benchmarking results aggregated over parallel pods, one row per phase.
Applies
benchmarking_set_datatypes(), aggregates viabenchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by(experiment_run, target, pod_count).- Returns:
DataFrame indexed as
"DBMS"with one row per phase, or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_benchmark_per_phase_multitenant()
Returns TPC-C benchmarking results aggregated per phase and tenant, one row per
(phase, tenant_id).Like
get_summary_benchmark_per_phase()but groups by['phase', 'tenant_id']so each tenant appears as a separate row.- Returns:
DataFrame indexed as
"DBMS"with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_loading_per_run()
Returns loading metrics aggregated per experiment run.
Delegates to
get_loading_per_run()(defined inbase), which reduces the per-connection loading DataFrame to one row per(code, configuration, experiment_run)and adds a'Throughput [SF/h]'column.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- log_to_df(filename)
Parses a HammerDB TPC-C pod log file into a DataFrame.
Extracts NOPM, TPM, vuser counts, and — when HammerDB time-profile output is present — latency statistics (CALLS, MIN, AVG, MAX, TOTAL, P99, P95, P50, SD, RATIO) for the
NEWORDprocedure.- Parameters:
filename (str) – Absolute path to the HammerDB log file.
- Returns:
DataFrame with one row per TPC-C result iteration, or empty on parse failure.
- Return type:
pandas.DataFrame
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record TPC-C pass/fail tests: NOPM throughput and workflow completeness.
- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame (unused here).
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
- test_results()
Validates results by reading all pickle files and delegating to the parent check.
- Returns:
0on success,1if an exception is raised.- Return type:
- class bexhoma.evaluators.YcsbEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
LogEvaluatorEvaluator for a YCSB experiment.
Parses per-pod log files to extract operation counts, throughput, and per-operation latency statistics produced by the Yahoo Cloud Serving Benchmark (YCSB) tool. Provides time-series access to per-second throughput for both the benchmarking and loading phases via
get_benchmark_logs_timeseries_df_aggregated(),get_loading_logs_timeseries_df_aggregated(), and their*_singlevariants.- Parameters:
code – Experiment identifier — also the name of the result sub-folder.
path – Root path that contains the result folders.
include_loading – Ignored; loading is always enabled for this evaluator.
include_benchmarking – Ignored; benchmarking is always enabled.
- benchmark_logs_to_timeseries_df(list_logs, metric='current_ops_per_sec', aggregate=True, filetype='benchmarker')
Parses benchmarker log files for the given pod IDs and assembles a time-series DataFrame.
Delegates to
logs_to_timeseries_df()withfiletype='benchmarker'.- Parameters:
- Returns:
Aggregated DataFrame or list of per-pod DataFrames.
- Return type:
pandas.DataFrame or list[pandas.DataFrame]
- benchmarking_aggregate_by_parallel_pods(df, columns=['phase'])
Aggregates parallel-pod YCSB benchmarking rows into one row per group.
Groups by
columnsand sums counts/throughput, takes mean for average latencies, and max for percentile/max latencies.The
phasecolumn holds the phase identifier (configuration-experiment_run-client) and thejobcolumn holds the job identifier (configuration-experiment_run-client-benchmark_run).The default
columns=['phase']groups by phase, producing one row per phase. To keep one row per job, passcolumns=['job'].
- benchmarking_set_datatypes(df)
Casts all YCSB benchmarking result columns to their appropriate data types.
Only casts operation-specific columns when they are present in the DataFrame.
- Parameters:
df (pandas.DataFrame) – DataFrame of raw YCSB benchmarking results.
- Returns:
DataFrame with columns cast to correct types, or original
dfon error.- Return type:
pandas.DataFrame
- get_benchmark_logs_timeseries_df_aggregated(metric='current_ops_per_sec', configuration='', client='1', experiment_run='1')
Returns a DataFrame of per-second benchmarking time-series, aggregated across pods.
Retrieves pod IDs from
get_df_benchmarking()and delegates tobenchmark_logs_to_timeseries_df()withaggregate=True.- Parameters:
- Returns:
DataFrame indexed by second with one metric column and an
'avg'column.- Return type:
pandas.DataFrame
- get_benchmark_logs_timeseries_df_single(metric='current_ops_per_sec', configuration='', client='1', experiment_run='1')
Returns a list of per-pod benchmarking time-series DataFrames (one per pod).
- Parameters:
- Returns:
List of DataFrames, one per pod, each indexed by second.
- Return type:
list[pandas.DataFrame]
- get_df_loading()
Returns the DataFrame containing all loading-phase results.
- Returns:
DataFrame of loading results, or empty DataFrame when unavailable.
- Return type:
pandas.DataFrame
- get_loading_logs_timeseries_df_aggregated(metric='current_ops_per_sec', configuration='', experiment_run='1')
Returns a DataFrame of time series of a metric for the loading phase, aggregated over all pods per second.
Uses
get_df_loading()to retrieve the pod list andbenchmark_logs_to_timeseries_df()to parse and aggregate the log files. Restricts to a configuration and an experiment run. Aggregation follows the same strategy as for the benchmarking phase: percentiles and maximum by max, minimum by min, average by average,'current_ops_per_sec'and all others by sum.- Parameters:
- Returns:
DataFrame indexed by second with one column for the aggregated metric plus an
'avg'column, or an empty DataFrame when no files are found.- Return type:
pandas.DataFrame
- get_loading_logs_timeseries_df_single(metric='current_ops_per_sec', configuration='', experiment_run='1')
Returns a list of DataFrames of time series of a metric for the loading phase, one per pod.
Uses
get_df_loading()to retrieve the pod list andbenchmark_logs_to_timeseries_df()to parse the log files without aggregation. Restricts to a configuration and an experiment run.- Parameters:
- Returns:
List of DataFrames, one per pod, each indexed by second with one metric column.
- Return type:
list[pandas.DataFrame]
- get_loading_per_connection()
Returns loading metrics for each individual connection, merged with connection metadata and enriched with the scale factor.
Combines the aggregated loading DataFrame (from
get_df_loading()) with connection metadata (fromget_connections_of_experiment()) on(code, configuration, experiment_run), then normalises the index. Rows for which no loading log was recorded (missingpod_count) are dropped.- Returns:
DataFrame with one row per loading run, indexed as
{code}-{configuration}-{experiment_run}.- Return type:
pandas.DataFrame
- get_loading_per_pod()
Returns the raw loading DataFrame with one row per pod.
- Returns:
DataFrame from
get_df_loading()— one row per loading pod.- Return type:
pandas.DataFrame
- get_loading_per_run()
Returns loading metrics aggregated per
(code, configuration, experiment_run).Overrides the base implementation to derive
'Throughput [SF/h]'from'[OVERALL].RunTime(ms)'rather thantime_load, since YCSB loading results carry wall-clock run time in milliseconds rather than the bexhoma connection-level timing.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- get_summary_benchmark_per_connection()
Returns benchmarking results with one row per pod, filtered to the key display columns.
Applies
benchmarking_set_datatypes()and selects the columns used for the per-connection summary table (experiment run, terminals, target, client, child, time, errors, throughput, goodput, efficiency, and latency percentiles), then sorts by(experiment_run, client, child).- Returns:
DataFrame indexed as
"DBMS"with one row per pod, or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_benchmark_per_phase()
Returns benchmarking results aggregated over parallel pods, one row per phase.
Applies
benchmarking_set_datatypes(), aggregates viabenchmarking_aggregate_by_parallel_pods(), and selects the columns used for the per-phase summary table (experiment run, terminals, target, pod count, time, errors, throughput, goodput, efficiency, and latency percentiles), sorted by(experiment_run, target, pod_count).- Returns:
DataFrame indexed as
"DBMS"with one row per phase, or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_benchmark_per_phase_multitenant()
Returns YCSB benchmarking results aggregated per phase and tenant, one row per
(phase, tenant_id).Like
get_summary_benchmark_per_phase()but groups by['phase', 'tenant_id']so each tenant appears as a separate row.- Returns:
DataFrame indexed as
"DBMS"with one row per (phase, tenant), or an empty DataFrame if there are no benchmarking results.- Return type:
pandas.DataFrame
- get_summary_loading_per_connection()
Returns loading metrics aggregated per experiment run.
Delegates to
get_df_loading()(defined inbase), which reduces the per-connection loading DataFrame to one row per(code, configuration, experiment_run)and adds a'Throughput [SF/h]'column.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- get_summary_loading_per_run()
Returns loading metrics aggregated per experiment run.
Delegates to
get_df_loading()(defined inbase), which reduces the per-connection loading DataFrame to one row per(code, configuration, experiment_run)and adds a'Throughput [SF/h]'column.- Returns:
DataFrame with one row per experiment run.
- Return type:
pandas.DataFrame
- loading_aggregate_by_parallel_pods(df, columns=['phase'])
Aggregates parallel-pod YCSB loading rows into one row per job.
The
phasecolumn storesBEXHOMA_CONNECTION, which is the job identifier (configuration-experiment_run-client-benchmark_run). The defaultcolumns=['phase']therefore groups by job identifier, producing one row per job. To aggregate per phase, passcolumns=['configuration', 'experiment_run', 'client'].
- loading_logs_to_timeseries_df(list_logs, metric='current_ops_per_sec', aggregate=True, filetype='benchmarker')
Parses loader log files for the given pod IDs and assembles a time-series DataFrame.
Delegates to
logs_to_timeseries_df()withfiletype='loading'.- Parameters:
- Returns:
Aggregated DataFrame or list of per-pod DataFrames.
- Return type:
pandas.DataFrame or list[pandas.DataFrame]
- loading_set_datatypes(df)
Casts all YCSB loading result columns to their appropriate data types.
- Parameters:
df (pandas.DataFrame) – DataFrame of raw YCSB loading results.
- Returns:
DataFrame with columns cast to correct types.
- Return type:
pandas.DataFrame
- log_to_df(filename)
Parses a YCSB pod log file into a single-row DataFrame.
Extracts connection metadata, benchmark parameters, and per-operation metrics (throughput, latency percentiles) from the YCSB summary output.
- Parameters:
filename (str) – Absolute path to the YCSB log file.
- Returns:
Single-row DataFrame of YCSB results, or empty on parse failure.
- Return type:
pandas.DataFrame
- logs_to_timeseries_df(list_logs, metric='current_ops_per_sec', aggregate=True, filetype='benchmarker')
Parses YCSB log files for the given pod IDs and assembles a time-series DataFrame.
Each pod ID in
list_logsis resolved to matching log files via a glob pattern that usesfiletypeto distinguish benchmarker from loading logs. WhenaggregateisTruethe per-second values from all pods are combined: percentile/max metrics use element-wise maximum, minimum metrics use element-wise minimum, and all others are summed. WhenaggregateisFalsea list of per-pod DataFrames is returned instead.- Parameters:
- Returns:
Aggregated DataFrame indexed by
'sec'(with an'avg'column appended) whenaggregateisTrue, or a list of per-pod DataFrames.- Return type:
pandas.DataFrame or list[pandas.DataFrame]
- parse_ycsb_log_file(file_path)
Scans the lines of a YCSB log file. Extracts relevant performance infos for time series analysis. Each line starting with a time stamp is converted into a dict containing measurements (operations, sec of measurement, READ latency, …)-
- Parameters:
file_path – Full path of log file
- Returns:
List of dicts of measures, one entry per line
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record YCSB pass/fail tests.
Tests overall throughput for the loading phase (when data is available) and the execution phase, workflow completeness, and absence of FAILED operation columns.
- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame; empty if loading was not active.
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
- bexhoma.evaluators.base
alias of
EvaluatorBase
- bexhoma.evaluators.benchbase
alias of
BenchbaseEvaluator
- bexhoma.evaluators.dbmsbenchmarker
alias of
DbmsBenchmarkerEvaluator
- bexhoma.evaluators.logger
alias of
LogEvaluator
- bexhoma.evaluators.natural_sort(items)
Sorts a list in natural (human) order so that embedded digit runs are compared numerically rather than lexicographically. Works for lists of strings, integers, or any mix whose elements have a meaningful
str()representation.
- bexhoma.evaluators.tpcc
alias of
TpccEvaluator
- bexhoma.evaluators.ycsb
alias of
YcsbEvaluator