bexhoma.experiments module

Date:

2022-10-01

Version:

0.6.0

Authors:

Patrick K. Erdelt

Classes for managing an experiment. This is plugged into a cluster object. It collects some configuation objects. Two examples are included, dealing with TPC-H and TPC-DS tests. Another example concerns TSBS experiment. Each experiment also should have an own folder having:

a query file
a subfolder for each dbms, that may run this experiment, including schema files

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class bexhoma.experiments.DictToObject(dictionary)

Bases: object

https://coderwall.com/p/idfiea/python-dict-to-object

class bexhoma.experiments.benchbase(cluster, code=None, SF='1', num_experiment_to_apply=1, timeout=7200)

Bases: default

Class for defining a Benchbase experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor), i.e. number of rows divided by 10.000

evaluate_results(pod_dashboard=''): Build a DataFrame locally that contains all benchmarking results. This is specific to Benchbase.

get_parts_of_name(name)

log_to_df(filename)

test_results()

Run test script locally. Extract exit code.

Returns:: exit code of test script

class bexhoma.experiments.default(cluster, code=None, num_experiment_to_apply=1, timeout=7200, detached=True)

Bases: object

Class for defining an experiment. Settings are set generally. This class should be overloaded to define specific experiments.

add_benchmark_list(list_clients)

Add a list of (number of) benchmarker instances, that are to benchmark the current SUT. Example [1,2,1] means sequentially we will have 1, then 2 and then 1 benchmarker instances. This is applied to all dbms configurations of the experiment.

Parameters:: list_clients – List of (number of) benchmarker instances

add_configuration(configuration)

Adds a configuration object to the list of configurations of this experiment. When a new configuration object is instanciated, an experiment object has to be provided. This method is then called automatically.

Parameters:: configuration – Configuration object

benchmark_list(list_clients)

DEPRECATED? Is not used anymore. Runs a given list of benchmarker applied to all running SUTs of experiment.

Parameters:: list_clients – List of (number of) benchmarker instances

delay(sec, silent=False)

Function for waiting some time and inform via output about this. Synonymous for wait()

Parameters:

sec – Number of seconds to wait
silent – True means we do not output anything about this waiting

end_benchmarking(jobname, config=None)

Ends a benchmarker job. This is for storing or cleaning measures.

Parameters:

jobname – Name of the job to clean
config – Configuration object

end_loading(jobname)

Ends a loading job. This is for storing or cleaning measures.

Parameters:: jobname – Name of the job to clean

evaluate_results(pod_dashboard='')

Let the dashboard pod build the evaluations. This is specific to dbmsbenchmarker.

All local logs are copied to the pod.
Benchmarker in the dashboard pod is updated (dev channel)
All results of all DBMS are joined (merge.py of benchmarker) in dashboard pod
Evaluation cube is built (python benchmark.py read -e yes) in dashboard pod

extract_job_timing(jobname, container)

get_job_timing_benchmarking(jobname)

get_job_timing_loading(jobname)

get_workflow_list()

Returns benchmarking workflow as dict of lists of lists. Keys are connection names. Values are lists of lists. Each inner list is for example added by add_benchmark_list(), c.f. Inner lists are repeated according to self.num_experiment_to_apply. Example: {‘PostgreSQL-24-1-16384’: [[1, 2]], ‘MySQL-24-1-16384’: [[1, 2]], ‘PostgreSQL-24-1-32768’: [[1, 2]], ‘MySQL-24-1-32768’: [[1, 2]]}

Returns:: Dict of benchmarking workflow

patch_benchmarking(patch)

Patches YAML of loading components. Can be set by experiment before creation of configuration.

Parameters:: patch – String in YAML format, overwrites basic YAML file content

patch_loading(patch)

Patches YAML of loading components. Can be overwritten by configuration.

Parameters:: patch – String in YAML format, overwrites basic YAML file content

set_additional_labels(**kwargs)

Sets additional labels, that will be put to K8s objects (and ignored otherwise). This is for the SUT component. Can be overwritten by configuration.

Parameters:: kwargs – Dict of labels, example ‘SF’ => 100

set_benchmarking_parameters(**kwargs)

Sets ENV for benchmarking components. Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘PARALLEL’ => ‘64’

set_connectionmanagement(**kwargs)

Sets connection management data for the experiment. This is for the benchmarker component (dbmsbenchmarker). Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘timout’ => 60

set_ddl_parameters(**kwargs)

Sets DDL parameters for the experiments. This substitutes placeholders in DDL script. Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘index’ => ‘btree’

set_eval_parameters(**kwargs)

Sets some arbitrary parameters that are supposed to be handed over to the benchmarker component. Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘type’ => ‘noindex’

set_experiment(instance=None, volume=None, docker=None, script=None, indexing=None)

Read experiment details from cluster config

Parameters:

instance –
volume –
docker –
script –

set_experiments_configfolder(experiments_configfolder)

Sets the configuration folder for the experiment. Bexhoma expects subfolders for expeiment types, for example tpch. In there, bexhoma looks for query.config files (for dbmsbenchmarker) and subfolders containing the schema per dbms.

Parameters:: experiments_configfolder – Relative path to an experiment folder

set_loading(parallel, num_pods=None)

Sets job parameters for loading components: Number of parallel pods and optionally (if different) total number of pods. By default total number of pods is set to number of parallel pods. Can be overwritten by configuration.

Parameters:

parallel – Number of parallel pods
num_pods – Optionally (if different) total number of pods

set_loading_parameters(**kwargs)

Sets ENV for loading components. Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘PARALLEL’ => ‘64’

set_maintaining(parallel, num_pods=None)

Sets job parameters for maintaining components: Number of parallel pods and optionally (if different) total number of pods. By default total number of pods is set to number of parallel pods. Can be overwritten by configuration.

Parameters:

parallel – Number of parallel pods
num_pods – Optionally (if different) total number of pods

set_maintaining_parameters(**kwargs)

Sets ENV for maintaining components. Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘PARALLEL’ => ‘64’

set_nodes(**kwargs)

set_queryfile(queryfile)

Sets the name of a query file of the experiment. This is for the benchmarker component (dbmsbenchmarker).

Parameters:: code – Unique identifier of an experiment

set_querymanagement(**kwargs)

Sets query management data for the experiment. This is for the benchmarker component (dbmsbenchmarker).

Parameters:: kwargs – Dict of meta data, example ‘numRun’ => 3

set_querymanagement_monitoring(numRun=256, delay=10, datatransfer=False)

Sets some parameters that are supposed to be suitable for a monitoring test:

high number of runs
optional delay
optional data transfer
monitoring active

Parameters:

numRun – Number of runs per query (this is for the benchmarker component)
delay – Number of seconds to wait between queries (this is for the benchmarker component)
datatransfer – If data should we retrieved and compared

set_querymanagement_quicktest(numRun=1, datatransfer=False)

Sets some parameters that are supposed to be suitable for a quick functional test:

small number of runs
no delay
optional data transfer
no monitoring

Parameters:

numRun – Number of runs per query (this is for the benchmarker component)
datatransfer – If data should we retrieved and compared

set_resources(**kwargs)

Sets resources for the experiment. This is for the SUT component. Can be overwritten by experiment and configuration.

Parameters:: kwargs – Dict of meta data, example ‘requests’ => {‘cpu’ => 4}

set_storage(**kwargs)

Sets parameters for the storage that might be attached to components. This is in particular for the database of dbms under test. Example:

storageClassName = ‘ssd’, storageSize = ‘100Gi’, keep = False

Can be overwritten by configuration.

Parameters:: kwargs – Dict of meta data, example ‘storageSize’ => ‘100Gi’

set_workload(**kwargs)

Sets mata data about the experiment, for example name and description.

Parameters:: kwargs – Dict of meta data, example ‘name’ => ‘TPC-H’

show_summary()

start_loading(): Tells all dbms configurations of this experiment to start loading data.

start_monitoring(): Start monitoring for all dbms configurations of this experiment.

start_sut(): Start all dbms configurations of this experiment.

stop_benchmarker(configuration=''): Stop all benchmarker jobs of this experiment. If a dbms configurations is given, use it. Otherwise tell the cluster to stop all benchmarker jobs belonging to this experiment code.

stop_loading(): Stop all loading jobs of this experiment. If a list of dbms configurations is set, use it. Otherwise tell the cluster to stop all loading jobs belonging to this experiment code.

stop_maintaining(): Stop all maintaining jobs of this experiment. If a list of dbms configurations is set, use it. Otherwise tell the cluster to stop all maintaining jobs belonging to this experiment code.

stop_monitoring(): Stop all monitoring deployments of this experiment. If a list of dbms configurations is set, use it. Otherwise tell the cluster to stop all monitoring deployments belonging to this experiment code.

stop_sut(): Stop all SUT deployments of this experiment. If a list of dbms configurations is set, use it. Otherwise tell the cluster to stop all monitoring deployments belonging to this experiment code.

test_results()

Run test script in dashboard pod. Extract exit code.

Returns:: exit code of test script

wait(sec, silent=False)

Function for waiting some time and inform via output about this

Parameters:

sec – Number of seconds to wait
silent – True means we do not output anything about this waiting

work_benchmark_list(intervals=30, stop=True)

Run typical workflow:

start SUT
start monitoring
start loading (at first scripts (schema or loading via pull), then optionally parallel loading pods)
optionally start maintaining pods
at the same time as 4. run benchmarker jobs corresponding to list given via add_benchmark_list()

Parameters:

intervals – Seconds to wait before checking change of status
stop – Tells if SUT should be removed when all benchmarking has finished. Set to False if we want to have loaded SUTs for inspection.

zip(): Zip the result folder in the dashboard pod.

class bexhoma.experiments.example(cluster, code=None, queryfile='queries.config', num_experiment_to_apply=1, timeout=7200, script=None)

Bases: default

Class for defining a custom example experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment

class bexhoma.experiments.iot(cluster, code=None, queryfile='queries-iot.config', SF='1', num_experiment_to_apply=1, timeout=7200)

Bases: default

Class for defining an TSBS experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor)

set_queries_full()

set_queries_profiling()

set_querymanagement_maintaining(numRun=128, delay=5, datatransfer=False)

class bexhoma.experiments.tpcc(cluster, code=None, SF='1', num_experiment_to_apply=1, timeout=7200)

Bases: default

Class for defining an TPC-C experiment (in the HammerDB version). This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor), i.e. number of warehouses

evaluate_results(pod_dashboard=''): Build a DataFrame locally that contains all benchmarking results. This is specific to HammerDB.

test_results()

Run test script locally. Extract exit code.

Returns:: exit code of test script

class bexhoma.experiments.tpcds(cluster, code=None, queryfile='queries-tpcds.config', SF='100', num_experiment_to_apply=1, timeout=7200)

Bases: default

Class for defining an TPC-DS experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor)

set_queries_full()

set_queries_profiling()

class bexhoma.experiments.tpch(cluster, code=None, queryfile='queries-tpch.config', SF='100', num_experiment_to_apply=1, timeout=7200, script=None)

Bases: default

Class for defining an TPC-H experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor)

set_queries_full()

set_queries_profiling()

show_summary()

class bexhoma.experiments.tsbs(cluster, code=None, queryfile='queries-tsbs.config', SF='1', num_experiment_to_apply=1, timeout=7200)

Bases: default

Class for defining an TSBS experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor)

set_queries_full()

set_queries_profiling()

set_querymanagement_maintaining(numRun=128, delay=5, datatransfer=False)

class bexhoma.experiments.ycsb(cluster, code=None, SF='1', num_experiment_to_apply=1, timeout=7200)

Bases: default

Class for defining an YCSB experiment. This sets

the folder to the experiment - including query file and schema informations per dbms
name and information about the experiment
additional parameters - here SF (the scaling factor), i.e. number of rows divided by 10.000

evaluate_results(pod_dashboard=''): Build a DataFrame locally that contains all benchmarking results. This is specific to YCSB.

show_summary()

test_results()

Run test script locally. Extract exit code.

Returns:: exit code of test script