bexhoma.evaluators.logger module
Logger evaluator: reads benchmark log files into DataFrames.
Provides LogEvaluator, which extends EvaluatorBase by parsing
bexhoma benchmarker log files produced by Kubernetes pods and
transforming them into structured pandas DataFrames. All
benchmark-specific evaluators inherit from LogEvaluator.
Authors: Patrick K. Erdelt Copyright (C) 2020 Patrick K. Erdelt SPDX-License-Identifier: AGPL-3.0-or-later See LICENSE for details.
- class bexhoma.evaluators.logger.LogEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)
Bases:
EvaluatorBaseEvaluator base that reads benchmark log files into DataFrames.
Extends
baseby implementingend_benchmarking()andend_loading()to parse pod log files, pickle the resulting DataFrames, and collect them into a single combined pickle per phase. All benchmark-specific evaluators (benchbase,ycsb,tpcc,dbmsbenchmarker) inherit from this class.- end_benchmarking(jobname)
Parses all benchmarker log files for a job and caches results as pickle files.
Scans the result folder for files matching
bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.log, callslog_to_df()on each, and writes non-empty results to a<filename>.df.pickleside-car file.- Parameters:
jobname (str) – Job name used to filter matching log files.
- end_loading(jobname)
Parses all loader sensor log files for a job and caches results as pickle files.
Scans the result folder for files matching
bexhoma-loading-<jobname>*.sensor.log, callslog_to_df()on each, prints a message when errors are detected, and writes non-empty results to a<filename>.df.pickleside-car file.- Parameters:
jobname (str) – Job name used to filter matching log files.
- evaluate_results(pod_dashboard='')
Parses all pod log files and persists the results as pickled DataFrames.
Calls
transform_all_logs_benchmarking()and_collect_dfs()for the benchmarking phase wheninclude_benchmarkingis set, and analogously for the loading phase. Whenbenchmark_run > 0, each phase writes a per-benchmark pickle file rather than the shared*.all.df.pickle.
- get_connection_config()
Returns the parsed
connections.configas a list of connection dicts, sorted by connection name.
- get_df_benchmarking()
Returns the DataFrame containing all benchmarking-phase results.
Reads from the combined pickle file, triggering
evaluate_results()to generate it on first access if it does not yet exist.Ensures a
podscolumn is present: when the pickle was written before this column was added (older runs), it is derived frompod_count.- Returns:
DataFrame of benchmarking results, or empty DataFrame when unavailable.
- Return type:
pandas.DataFrame
- get_df_loading()
Returns the DataFrame containing all loading-phase results.
Reads from the combined pickle file if it exists.
- Returns:
DataFrame of loading results, or empty DataFrame when unavailable.
- Return type:
pandas.DataFrame
- get_monitoring_metric(metric, component='loading')
Returns a wide-format DataFrame of a single monitoring metric for a component.
Reads the pre-combined CSV produced by
transform_monitoring_results()and returns it transposed so that rows are timestamps and columns are connections.
- get_monitoring_metrics()
Returns the list of metric keys defined in the first connection’s monitoring block.
- plot(df, column, x, y, plot_by=None, kind='line', dict_colors=None, figsize=(12, 8))
Plots one or more line (or other) charts from a DataFrame.
When
plot_byisNone, a single chart is produced with one line per value incolumn. Whenplot_byis given, a grid of sub-plots is created — one per group defined byplot_by— with lines split bycolumnwithin each sub-plot.- Parameters:
df (pandas.DataFrame) – DataFrame containing the data to plot.
column (str) – Column whose unique values define individual lines.
x (str) – Column to use as the x-axis.
y (str) – Column to use as the y-axis.
plot_by (str or None) – Optional column whose values define separate sub-plots.
kind (str) – Plot kind passed to
DataFrame.plot(e.g.'line','bar').dict_colors (dict or None) – Optional colour mapping for the
kindkeyword.figsize (tuple) – Figure size as
(width, height)in inches.
- Returns:
Matplotlib axes object (single axes when
plot_byisNone).
- record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) None
Record pass/fail test results for this benchmark.
Default: tests only that the workflow matches the plan. Override in benchmark-specific evaluator subclasses to test metric columns.
- Parameters:
experiment – The owning experiment object.
df_loading – Per-run loading DataFrame.
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.
- test_results()
Validates results by loading and reconstructing the workflow.
- Returns:
0on success,1if an exception is raised.- Return type:
- transform_monitoring_results(component='loading')
Combines per-connection monitoring CSV files into a single wide-format CSV.
For example, per-connection files like:
query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-1.csv query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-2.csv
are merged into:
query_datagenerator_metric_total_cpu_util.csv
- Parameters:
component (str) – Component label used in the metric filename prefix (e.g.
'loading','stream').