bexhoma.evaluators.logger module

Logger evaluator: reads benchmark log files into DataFrames.

Provides LogEvaluator, which extends EvaluatorBase by parsing bexhoma benchmarker log files produced by Kubernetes pods and transforming them into structured pandas DataFrames. All benchmark-specific evaluators inherit from LogEvaluator.

class bexhoma.evaluators.logger.LogEvaluator(code, path, include_loading=False, include_benchmarking=True, benchmark_run: int = 0)

Bases: EvaluatorBase

Evaluator base that reads benchmark log files into DataFrames.

Extends base by implementing end_benchmarking() and end_loading() to parse pod log files, pickle the resulting DataFrames, and collect them into a single combined pickle per phase. All benchmark-specific evaluators (benchbase, ycsb, tpcc, dbmsbenchmarker) inherit from this class.

end_benchmarking(jobname)

Parses all benchmarker log files for a job and caches results as pickle files.

Scans the result folder for files matching bexhoma-benchmarker-<jobname>*.dbmsbenchmarker.log, calls log_to_df() on each, and writes non-empty results to a <filename>.df.pickle side-car file.

Parameters:: jobname (str) – Job name used to filter matching log files.

end_loading(jobname)

Parses all loader sensor log files for a job and caches results as pickle files.

Scans the result folder for files matching bexhoma-loading-<jobname>*.sensor.log, calls log_to_df() on each, prints a message when errors are detected, and writes non-empty results to a <filename>.df.pickle side-car file.

Parameters:: jobname (str) – Job name used to filter matching log files.

evaluate_results(pod_dashboard='')

Parses all pod log files and persists the results as pickled DataFrames.

Calls transform_all_logs_benchmarking() and _collect_dfs() for the benchmarking phase when include_benchmarking is set, and analogously for the loading phase. When benchmark_run > 0, each phase writes a per-benchmark pickle file rather than the shared *.all.df.pickle.

get_connection_config()

Returns the parsed connections.config as a list of connection dicts, sorted by connection name.

Returns:: List of connection configuration dicts.
Return type:: list[dict]

get_df_benchmarking()

Returns the DataFrame containing all benchmarking-phase results.

Reads from the combined pickle file, triggering evaluate_results() to generate it on first access if it does not yet exist.

Ensures a pods column is present: when the pickle was written before this column was added (older runs), it is derived from pod_count.

Returns:: DataFrame of benchmarking results, or empty DataFrame when unavailable.
Return type:: pandas.DataFrame

get_df_loading()

Returns the DataFrame containing all loading-phase results.

Reads from the combined pickle file if it exists.

Returns:: DataFrame of loading results, or empty DataFrame when unavailable.
Return type:: pandas.DataFrame

get_monitoring_metric(metric, component='loading')

Returns a wide-format DataFrame of a single monitoring metric for a component.

Reads the pre-combined CSV produced by transform_monitoring_results() and returns it transposed so that rows are timestamps and columns are connections.

Parameters:

metric (str) – Metric key (e.g. 'cpu_throttled_seconds_total').
component (str) – Component label used in the metric filename prefix.

Returns:

Wide-format DataFrame, or empty DataFrame if the file does not exist.

Return type:

pandas.DataFrame

get_monitoring_metrics()

Returns the list of metric keys defined in the first connection’s monitoring block.

Returns:: List of metric key strings, or empty list when no metrics are configured.
Return type:: list[str]

plot(df, column, x, y, plot_by=None, kind='line', dict_colors=None, figsize=(12, 8))

Plots one or more line (or other) charts from a DataFrame.

When plot_by is None, a single chart is produced with one line per value in column. When plot_by is given, a grid of sub-plots is created — one per group defined by plot_by — with lines split by column within each sub-plot.

Parameters:

df (pandas.DataFrame) – DataFrame containing the data to plot.
column (str) – Column whose unique values define individual lines.
x (str) – Column to use as the x-axis.
y (str) – Column to use as the y-axis.
plot_by (str or None) – Optional column whose values define separate sub-plots.
kind (str) – Plot kind passed to DataFrame.plot (e.g. 'line', 'bar').
dict_colors (dict or None) – Optional colour mapping for the kind keyword.
figsize (tuple) – Figure size as (width, height) in inches.

Returns:

Matplotlib axes object (single axes when plot_by is None).

record_tests(experiment, df_loading: DataFrame, df_reduced: DataFrame, workflow_actual: dict, workflow_planned: dict, **extra) → None

Record pass/fail test results for this benchmark.

Default: tests only that the workflow matches the plan. Override in benchmark-specific evaluator subclasses to test metric columns.

Parameters:

experiment – The owning experiment object.
df_loading – Per-run loading DataFrame.
df_reduced – Per-phase execution DataFrame.
workflow_actual – Reconstructed actual workflow dict.
workflow_planned – Planned workflow dict from workload config.

test_results()

Validates results by loading and reconstructing the workflow.

Returns:: 0 on success, 1 if an exception is raised.
Return type:: int

transform_monitoring_results(component='loading')

Combines per-connection monitoring CSV files into a single wide-format CSV.

For example, per-connection files like:

query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-1.csv
query_datagenerator_metric_total_cpu_util_MonetDB-NIL-1-2.csv

are merged into:

query_datagenerator_metric_total_cpu_util.csv

Parameters:: component (str) – Component label used in the metric filename prefix (e.g. 'loading', 'stream').