Concept: Collectors

Overview

The collectors module aggregates results from one or more bexhoma experiments into a single, analysis-ready view. Each collector wraps a benchmark-specific evaluator — the lower-level class that parses raw log files and result folders for a single experiment code.

collectors.benchbase(path, codes)        ← collector: spans N experiments
    └── get_evaluator(code)              ← evaluator: single experiment
            └── log parsing, DataFrames

Supported benchmark types:

Collector class	Benchmark tool	Evaluator class
`collectors.benchbase`	Benchbase	`evaluators.benchbase`
`collectors.tpcc`	HammerDB TPC-C	`evaluators.tpcc`
`collectors.ycsb`	YCSB	`evaluators.ycsb`
`collectors.dbmsbenchmarker`	DBMSBenchmarker	`evaluators.dbmsbenchmarker`
`collectors.mixed`	Multiple tools in one experiment	per-tool evaluator, selected automatically

Concepts

Naming

The following terms identify the components of a bexhoma experiment result. Within a single experiment these identifiers carry no code prefix; the code prefix is added only in collectors, where results from multiple experiments are merged.

Term	Description	Example
configuration	Name of the SUT instance	`PostgreSQL-1` (first PostgreSQL instance)
experiment_run	Repeat counter for the whole experiment	`2` (second run)
client	1-based index of the benchmark phase within a run	`3` (third phase)
phase	Benchmark phase identifier: `<configuration>-<experiment_run>-<client>`	`PostgreSQL-1-2-3`
benchmark_run	1-based index of a parallel benchmark job within a phase	`4` (fourth job)
job	Benchmark job identifier: `<phase>-<benchmark_run>`	`PostgreSQL-1-2-3-4`
pod	1-based index of a driver pod within a job	`5` (fifth pod)
connection	Driver-pod identifier: `<job>-<pod>`	`PostgreSQL-1-2-3-4-5`

In collectors, all job and connection identifiers are prefixed with the experiment code:

Collector-context identifier	Pattern	Example
code-prefixed phase	`<code>-<phase>`	`1775855486-PostgreSQL-1-2-3`
code-prefixed job	`<code>-<job>`	`1775855486-PostgreSQL-1-2-3-4`
code-prefixed connection	`<code>-<connection>`	`1775855486-PostgreSQL-1-2-3-4-5`

Most experiments run a single benchmark job per phase (benchmark_run = 1), in which case the job identifier PostgreSQL-1-2-3-1 is the only job for phase PostgreSQL-1-2-3.

Aggregation

Parallel pods within a job are aggregated into one row per job. Metric type determines the aggregation function:

Metric type	Aggregation
`counter`	`max − min` (delta over interval)
`ratio`	`max`
others	`mean`

Multi-tenancy adds a further dimension: a single service provider may map to several configurations (container instances). Dedicated _multitenant methods aggregate across tenants before returning results.

Quick Reference

Configuration & Metadata

Method	Returns	Index
`get_connections()`	Connection metadata for all experiments	connection name
`get_metrics_metadata()`	Hardware metric definitions (title, type, active)	metric key
`get_monitored_components(code)`	Components monitored in the workload config	component key
`get_workload(code)`	Raw workload properties dict	—
`get_evaluator(code)`	Benchmark-specific evaluator for one experiment	—

Monitoring — Aggregated

Method	Returns	Index
`get_monitoring_aggregated_per_phase(component)`	One scalar per metric per phase (all jobs merged)	phase name
`get_monitoring_aggregated_per_job(component)`	One scalar per metric per job (jobs kept separate)	job name
`get_monitoring_aggregated_per_phase_multitenant(component)`	Per-job aggregation, additionally grouped across tenants	`code_run_client_typetenant_numtenant`

Monitoring — Time Series

Method	Format	Index
`get_monitoring_timeseries_single(code, metric, component)`	Wide — columns are pod instances	timestamp
`get_monitoring_timeseries_per_phase(code, metric, component)`	Wide transposed — rows are pod instances	pod/instance
`get_monitoring_timeseries_all(metric, component)`	Long — all codes stacked	enumerated
`get_monitoring_timeseries_all_multitenant(metric, component)`	Long — all codes, tenant-annotated	enumerated

Performance — Benchmarking

Method	Returns	Index
`get_performance_per_connection()`	Unaggregated per-pod results, all codes	connection name
`get_performance_aggregated_per_phase()`	Aggregated per phase (all jobs in a phase merged)	phase name
`get_performance_aggregated_per_job()`	Aggregated per job (jobs in a phase kept separate)	job name
`get_performance_aggregated_per_phase_multitenant()`	Per-job aggregation with tenant metadata columns	job name

Performance — Loading

Method	Returns	Index
`get_loading_per_pod()`	Per-pod loading metrics, all codes	connection name
`get_loading_per_connection()`	Per-connection loading metrics, all codes	connection name
`get_loading_per_run()`	Loading metrics aggregated per experiment run	connection name
`get_loading_per_run_multitenant()`	Loading metrics per run with tenant metadata	connection name

Utility

Method	Description
`add_metadata(df)`	Joins connection metadata onto any monitoring DataFrame
`collectors.get_non_constant(df)`	Drops constant columns — highlights parameters that vary

Class `collector`

Constructor

collect = collectors.benchbase(path, codes)

path — filesystem path containing the experiment result directories.
codes — list of experiment codes to aggregate, e.g. ["1776751747", "1776754749"].

The constructor reads connections.config from the first code to build the metrics metadata table.

An optional benchmark_run parameter (default 0) filters all result methods to a single benchmark session; 0 includes all sessions.

For experiments that ran multiple benchmark tools in sequence, use collectors.mixed instead — see the dedicated section below.

Configuration & Metadata

get_connections()
Returns a DataFrame of connection metadata for all experiments.
Columns include: phase (code-prefixed phase identifier: <code>-<configuration>-<experiment_run>-<client>), job (code-prefixed job identifier: <code>-<configuration>-<experiment_run>-<client>-<benchmark_run>), code, connection, configuration, experiment_run, benchmark_run, client, type_tenants, num_tenants, vol_tenants.

get_metrics_metadata()
Returns a DataFrame listing all hardware metrics defined in the experiment, with columns title, active, type, metric.

get_monitored_components(code='')
Returns a DataFrame of monitored component roles (e.g. loading, stream, database) with a description column.

get_evaluator(code='')
Returns the benchmark-specific evaluator for the given experiment code. Used internally but also useful for ad-hoc access to single-experiment data.

Monitoring — Aggregated

get_monitoring_aggregated_per_job(component='stream')
Returns one row per benchmark job across all codes. Each metric column is reduced to a scalar using the metric-type aggregation rule (counter → sum of deltas, ratio → max, other → mean). Index is the code-prefixed job identifier.

get_monitoring_aggregated_per_phase(component='stream')
Returns one row per phase across all codes. Calls get_monitoring_aggregated_per_job() and further reduces by grouping on (code, configuration, experiment_run, client), collapsing all parallel jobs within a phase. Aggregation rules: ratio → max, counter → sum, other → mean. Index is the code-prefixed phase identifier.

get_monitoring_aggregated_per_phase_multitenant(component='stream')
Extends get_monitoring_aggregated_per_job() by grouping across tenants. Ratio metrics are reduced with max; counter metrics with sum (except Total I/O Wait Time, which uses max). Index is the underscore-joined group key.

Monitoring — Time Series

get_monitoring_timeseries_single(code, metric='pg_locks_count', component='stream')
Returns a wide-format DataFrame for one metric in one experiment. Rows are timestamps; columns are monitored pod instances.

get_monitoring_timeseries_per_phase(code, metric='pg_locks_count', component='stream')
Same data as above, transposed: rows are pod instances, columns are timestamps.

get_monitoring_timeseries_all(metric='pg_locks_count', component='stream')
Collects long-format time-series across all codes. Columns include timestamp, value, code, phase, experiment_run, client, type_tenants, num_tenants, vol_tenants, metric, component. Values are summed across parallel pods.

get_monitoring_timeseries_all_multitenant(metric='pg_locks_count', component='stream')
Like get_monitoring_timeseries_all but annotates each row with tenant metadata from the workload config. For non-container tenancy the tenant column is set to "0".

Performance — Benchmarking

get_performance_per_connection()
Unaggregated per-pod results concatenated across all codes. Job, connection, and configuration columns are prefixed with the experiment code.

get_performance_aggregated_per_job()
Aggregates parallel pods within each job, then concatenates all codes. Groups by the job identifier (configuration-experiment_run-client-benchmark_run), producing one row per benchmark job. Index is the code-prefixed job identifier.

get_performance_aggregated_per_phase()
Aggregates parallel pods and all parallel jobs within each phase, then concatenates all codes. Groups by the phase identifier (configuration-experiment_run-client), collapsing all benchmark_run values. Index is the code-prefixed phase identifier.

get_performance_aggregated_per_phase_multitenant()
Per-job aggregation with tenant metadata. Adds type_tenants, num_tenants, and vol_tenants columns from the workload configuration. Groups by (code, experiment_run, client, benchmark_run, type_tenants, num_tenants), producing one row per job within each tenant group.

Performance — Loading

get_loading_per_pod()
Raw loading metrics for each individual pod across all codes.

get_loading_per_connection()
Loading metrics aggregated at connection level across all codes.

get_loading_per_run()
Loading metrics further aggregated per experiment run (one row per code-configuration-experiment_run).

get_loading_per_run_multitenant()
Same as above but fetches the multi-tenant variant from the evaluator, which includes tenant metadata columns.

Utility

add_metadata(df)
Enriches a DataFrame with connection metadata by attempting four join strategies in order:

Index × job column — when df’s index intersects df_connections['job']. Used for per-job monitoring DataFrames (from get_monitoring_aggregated_per_job()) whose index is the code-prefixed job identifier. Each job has a distinct identifier, so parallel benchmark jobs are correctly distinguished.
Shared index — when df’s index intersects df_connections’s index.
phase column join — when both DataFrames have a phase column. Used for merged performance + monitoring DataFrames where results are already aggregated to phase granularity.
Multi-tenant key join on (code, experiment_run, client, type_tenants, num_tenants) — used after get_monitoring_aggregated_per_phase_multitenant() has already aggregated away the benchmark_run dimension via its groupby, so omitting benchmark_run from the key is intentional.
Loading key join on (code, configuration, experiment_run).

collectors.get_non_constant(df)
Standalone function. Drops all columns whose values are identical across every row — useful after get_connections() to surface the parameters that actually vary across experiment runs.

Benchmark-Specific Subclasses

Each subclass extends base by providing the correct evaluator. Most data retrieval is inherited; subclasses add benchmark-specific parsing.

`collectors.benchbase`

For Benchbase experiments.

Evaluator provides access to throughput and latency time-series parsed directly from Benchbase log files:

evaluate = collect.get_evaluator(code)
evaluate.get_benchmark_logs_timeseries_df_aggregated(metric, configuration, client, experiment_run)
evaluate.get_benchmark_logs_timeseries_df_single(metric, configuration, client, experiment_run)

metric is typically "throughput" or "latency".

The collector adds two time-series methods that span all experiment codes:

get_benchmark_timeseries_per_phase(metric='throughput')
Wide-format DataFrame: index is second, each column represents one benchmark job identified by (code, configuration, experiment_run, client, benchmark_run), labelled {code}-{configuration}-{experiment_run}-{client}-{benchmark_run}. Jobs with different time ranges produce NaN for missing seconds.

get_benchmark_timeseries_all(metric='throughput')
Long-format DataFrame with columns second, code, configuration, experiment_run, client, benchmark_run, metric, value, plus connection metadata columns (e.g. type_tenants, num_tenants, vol_tenants). One row per second per job.

`collectors.tpcc`

For HammerDB TPC-C experiments.

Evaluator parses TPM (Transactions Per Minute) and NOPM (New Orders Per Minute) from HammerDB log output.

`collectors.ycsb`

For YCSB experiments.

Loading and benchmarking are both instrumented. The evaluator additionally exposes:

evaluate = collect.get_evaluator(code)

# Loading — summary
evaluate.get_loading_per_connection()
evaluate.get_loading_per_pod()

# Benchmarking — time series (per configuration, client, experiment_run)
evaluate.get_benchmark_logs_timeseries_df_aggregated(metric, configuration, client, experiment_run)
evaluate.get_benchmark_logs_timeseries_df_single(metric, configuration, client, experiment_run)

# Loading — time series (per configuration, experiment_run; no client dimension)
evaluate.get_loading_logs_timeseries_df_aggregated(metric, configuration, experiment_run)
evaluate.get_loading_logs_timeseries_df_single(metric, configuration, experiment_run)

metric defaults to "current_ops_per_sec" for all time-series methods. Both loading variants reuse the same log parser as the benchmarking variants; the only difference is that the pod list is drawn from get_df_loading() instead of get_df_benchmarking(), and there is no client filter.

The collector adds four time-series methods that span all experiment codes:

get_benchmark_timeseries_per_phase(metric='current_ops_per_sec')
Wide-format DataFrame: index is second, each column represents one benchmark job identified by (code, configuration, experiment_run, client, benchmark_run), labelled {code}-{configuration}-{experiment_run}-{client}-{benchmark_run}.

get_benchmark_timeseries_all(metric='current_ops_per_sec')
Long-format DataFrame with columns second, code, configuration, experiment_run, client, benchmark_run, metric, value, plus connection metadata columns. One row per second per job.

get_loading_timeseries_per_phase(metric='current_ops_per_sec')
Wide-format DataFrame for the loading phase: index is second, each column represents one (code, configuration, experiment_run) loading job, labelled {code}-{configuration}-{experiment_run}. No client dimension.

get_loading_timeseries_all(metric='current_ops_per_sec')
Long-format DataFrame for the loading phase with columns second, code, configuration, experiment_run, metric, value, plus connection metadata columns (excluding client). One row per second per loading job.

`collectors.dbmsbenchmarker`

For DBMSBenchmarker experiments. Adds query-level analytics:

get_query_latencies(query_titles=False)
Per-query latency statistics (aggregated with geometric mean across parallel pods) for all codes.

get_total_errors(query_titles=False)
Per-query failed-execution counts for all codes.

get_total_warnings(query_titles=False)
Per-query result-mismatch counts for all codes.

Pass query_titles=True to replace the numeric query index with the human-readable title from the query configuration.

`collectors.mixed`

For experiments that ran more than one benchmark tool in a single experiment (e.g. Benchbase followed by YCSB, or two different tools in the same client round).

How it works

When an experiment finishes, store_workflow_results() writes a benchmark_sequence key into queries.config:

# Example queries.config entry
'benchmark_sequence': [
    {'index': 1, 'type': 'benchbase'},
    {'index': 2, 'type': 'ycsb'},
]

collectors.mixed reads this key and dispatches to the correct typed sub-collector for each benchmark-run index, so each tool’s log files are parsed by the right evaluator.

The constraint is that all configurations in an experiment must share the same benchmark sequence (which is enforced by the experiment framework — every configuration receives a deep copy of the same experiment_dict_template).

Constructor

collect = collectors.mixed(path, codes)

Same signature as the typed collectors. Reads benchmark_sequence from queries.config of the first code.

Methods

get_typed_collector(benchmark_run: int)
Returns a fully functional typed sub-collector (e.g. collectors.benchbase) scoped to the given benchmark-run index. All standard methods (get_performance_aggregated_per_phase, get_monitoring_aggregated_per_phase, etc.) are available on the returned object.

get_performance_per_benchmark()
Calls get_performance_aggregated_per_phase() on each typed sub-collector and returns a dict keyed by benchmark type name:

results = collect.get_performance_per_benchmark()
# {'benchbase': df_benchbase, 'ycsb': df_ycsb}

When two entries share the same tool type (parallel runs of the same tool), their DataFrames are concatenated under the shared key, with benchmark_run as a distinguishing column.

Supported type names

`type` in `benchmark_sequence`	Mapped collector
`'benchbase'`	`collectors.benchbase`
`'ycsb'`	`collectors.ycsb`
`'tpcc'` / `'hammerdb'`	`collectors.tpcc`
`'dbmsbenchmarker'` / `'tpch'` / `'tpcds'`	`collectors.dbmsbenchmarker`

Minimal Example

from bexhoma import collectors

path  = "/results"
codes = ["1776751747", "1776754749"]

collect = collectors.benchbase(path, codes)

# What varied across these two experiments?
df_conn = collect.get_connections()
collectors.get_non_constant(df_conn)

# Benchmarking throughput, one row per job (benchmark_run kept separate)
collect.get_performance_aggregated_per_job()

# Benchmarking throughput, one row per phase (all benchmark_runs merged)
collect.get_performance_aggregated_per_phase()

# Hardware monitoring summary, one row per job
collect.get_monitoring_aggregated_per_job("stream")

# Hardware monitoring summary, one row per phase
collect.get_monitoring_aggregated_per_phase("stream")

# CPU time-series across all codes (long format)
collect.get_monitoring_timeseries_all(metric="total_cpu_memory")

# Enrich a custom DataFrame with experiment metadata
df = collect.get_monitoring_aggregated_per_phase("stream")
collect.add_metadata(df)

References

[1] Benchmarking Multi-Tenant Architectures in PostgreSQL
Erdelt, P.K., and Rabl T. (2026). In: Proceedings 29th International Conference on Extending Database Technology, EDBT 2026. OpenProceedings.org.

Concept: Collectors

Overview

Concepts

Naming

Aggregation

Quick Reference

Configuration & Metadata

Monitoring — Aggregated

Monitoring — Time Series

Performance — Benchmarking

Performance — Loading

Utility

Class collector

Constructor

Configuration & Metadata

Monitoring — Aggregated

Monitoring — Time Series

Performance — Benchmarking

Performance — Loading

Utility

Benchmark-Specific Subclasses

collectors.benchbase

collectors.tpcc

collectors.ycsb

collectors.dbmsbenchmarker

collectors.mixed

How it works

Constructor

Methods

Supported type names

Minimal Example

References

Class `collector`

`collectors.benchbase`

`collectors.tpcc`

`collectors.ycsb`

`collectors.dbmsbenchmarker`

`collectors.mixed`