memgraph/tests/mgbench/how_to_use_benchgraph.md
Ante Javor 940bf6722c
Add mgbench tutorial (#836)
* Add Docker runner
* Add Docker client
* Add benchgraph.sh script
* Add package script
2023-04-19 08:21:55 +02:00

13 KiB

How to use mgBench

Running your workloads that include custom queries and the dataset is the best way to evaluate system performance on your use case. Each workload has unique requirements that are imposed from the use case. Since your use-case queries and dataset will be used in production, it is best to use those. We worked on cleaning MgBench architecture so it is easier for users to add their custom workloads and queries to evaluate performance on supported systems.

This tutorial contains the following content:

How to add your custom workload

If you want to run your custom workload on supported systems (Currently, Memgraph and Neo4j), you can start by writing a simple Python class. The idea is to specify a simple class that contains your dataset generation queries, index generation queries and queries used for running a benchmark.

Here are 5 steps you need to do to specify your workload:

  1. Inherit the workload class
  2. Define a workload name
  3. Implement dataset generator method
  4. Implement index generator method
  5. Define the queries you want to benchmark

Here is the simplified version of demo.py example:

import random
from workloads.base import Workload

class Demo(Workload):

    NAME = "demo"

    def indexes_generator(self):
        indexes = [
                    ("CREATE INDEX ON :NodeA(id);", {}),
                    ("CREATE INDEX ON :NodeB(id);", {}),
                ]
        return indexes

    def dataset_generator(self):

        queries = []
        for i in range(0, 100):
            queries.append(("CREATE (:NodeA {id: $id});", {"id": i}))
            queries.append(("CREATE (:NodeB {id: $id});", {"id": i}))
        for i in range(0, 300):
            a = random.randint(0, 99)
            b = random.randint(0, 99)
            queries.append(
                (("MATCH(a:NodeA {id: $A_id}),(b:NodeB{id: $B_id}) CREATE (a)-[:EDGE]->(b)"), {"A_id": a, "B_id": b})
            )

        return queries

    def benchmark__test__get_nodes(self):
        return ("MATCH (n) RETURN n;", {})

    def benchmark__test__get_node_by_id(self):
        return ("MATCH (n:NodeA{id: $id}) RETURN n;", {"id": random.randint(0, 99)})


Let's break this script down into smaller important elements:

1. Inherit the workload class

The Demo script class has a parent class Workload. Each custom workload should inherit from the base Workload class.

from workloads.base import Workload

class Demo(Workload):

2. Define the workload name

The class should specify the NAME property. This is used to describe what workload class you want to execute. When calling benchmark.py, this property will be used to differentiate different workloads.

NAME = "demo"

3. Implement dataset generator method

The class should implement the dataset_generator() method. The method generates a dataset that returns the list of tuples. Each tuple contains a string of the Cypher query and dictionary that contains optional arguments, so the structure is following [(str, dict), (str, dict)...]. Let's take a look at how the example list could look like what it could method return:

queries = [
    ("CREATE (:NodeA {id: 23});", {}),
    ("CREATE (:NodeB {id: $id, foo: $property});", {"id" : 123, "property": "foo" }),
    ...
]

As you can see, you can pass just a Cypher query as a pure string without any values in the dictionary.

("CREATE (:NodeA {id: 23});", {}),

Or you can specify parameters inside a dictionary. The variables next to $ sign in the query string will be replaced by the appropriate values behind the key from the dictionary. In this case $id is replaced by 123 and $property is replaced by foo. The dictionary key names and variable names need to match.

("CREATE (:NodeB {id: $id, foo: $property});", {"id" : 123, "property": "foo" })

Back to our demo.py example, in the dataset_generator() method, here you specify queries for generating a dataset. In the first for loop the queries for creating 100 nodes with the label NodeA and 100 nodes with the label NodeB are prepared. Each node has id between 0 and 99. In the second for loop, queries for connecting nodes randomly are generated. There is a total of 300 edges, each connected to random NodeA and NodeB.

def dataset_generator(self):

    for i in range(0, 100):
        queries.append(("CREATE (:NodeA {id: $id});", {"id" : i}))
        queries.append(("CREATE (:NodeB {id: $id});", {"id" : i}))
    for i in range(0, 300):
        a = random.randint(0, 99)
        b = random.randint(0, 99)
        queries.append((("MATCH(a:NodeA {id: $A_id}),(b:NodeB{id: $B_id}) CREATE (a)-[:EDGE]->(b)"), {"A_id": a, "B_id" : b}))

    return queries

4. Implement the index generator method

The class should also implement the indexes_generator() method. This is implemented the same way as the dataset_generator() method, instead of queries for the dataset, indexes_generator() should return the list of indexes that will be used. The returning structure again is the list of tuples that contains query string and dictionary of parameters. Here is an example:

def indexes_generator(self):
    indexes = [
                ("CREATE INDEX ON :NodeA(id);", {}),
                ("CREATE INDEX ON :NodeB(id);", {}),
            ]
    return indexes

5. Define the queries you want to benchmark

Now that your dataset will be imported from dataset generator queries, you can specify what queries you wish to benchmark on the given dataset. Here are two queries that demo.py workload defines. They are written as Python methods that return a single tuple with query and dictionary, as in the data generator method.

def benchmark__test__get_nodes(self):
    return ("MATCH (n) RETURN n;", {})

def benchmark__test__get_node_by_id(self):
    return ("MATCH (n:NodeA{id: $id}) RETURN n;", {"id": random.randint(0, 99)})

The necessary details here are that each of the methods you wish to use in the benchmark test needs to start with benchmark__ in the name, otherwise, it will be ignored. The complete method name has the following structure benchmark__group__name. The group can be used to execute specific tests, but more on that later.

From the workload setup, this is all you need to do. Next step is how to run your workload. If you wish to improve the workload generator, look at customizing workload generator.

How to run benchmarks on your custom workload

When running benchmarks, duration, query arguments, number of workers, and database condition play an important role on the results of the benchmark. MgBench provides several options for the configuration of how the benchmark is executed. Let's start with the most straightforward run of the demo workload from the example above.

The main script that manages benchmark execution is benchmark.py.

To start the benchmark, you need to run the following command with your paths and options:

python3 benchmark.py vendor-docker --vendor-name (memgraph-docker||neo4j-docker) benchmarks demo/*/*/* --export-results result.json --no-authorization

To run this on memgraph, the command looks like this:

python3 benchmark.py vendor-docker --vendor-name memgraph-docker benchmarks demo/*/*/* --export-results results.json --no-authorization

How to configure benchmark run

Hopefully, you should get logs from benchmark.py process managing the benchmark and execution from the command above. The script takes a lot of arguments. Some used in the run above are self-explanatory. But let's break down the most important ones:

  • NAME/VARIANT/GROUP/QUERY - The argument demo/*/*/* says to execute the workload named demo, and all of its variants, group's and queries. This flag is used for direct control of what workload you wish to execute. The NAME here is the name of the workload defined in the Workload class. VARIANT is an additional workload configuration, which will be explained a bit later. GROUP is defined in the query method name, and the QUERY is query name you wish to execute. If you want to execute a specific query from demo.py, it would look like this: demo/*/test/get_nodes. This will run demo workload on all variants, in test query group and query get_nodes.

  • --single-threaded-runtime-sec - The question at hand is how many of each specific queries you wish to execute as a sample for a database benchmark. Each query can take a different time to execute, so fixating a number could yield some queries finishing in 1 second and others running for a minute. This flag defines the duration in seconds that will be used to approximate how many queries you wish to execute. The default value is 10 seconds, this means the benchmark.py will generate predetermined numbers of queries to approximate single treaded runtime of 10 seconds. Increasing this will yield a longer running test. Each specific query will get a different count that specifies how many queries will be generated. This can be inspected after the test. For example, for 10 seconds of single-threaded runtime, the queries from demo workload get_node_by_id got 64230 different queries, while get_nodes got 5061 because of different time complexity of queries.

  • --num-workers-for-benchmark - The flag defines how many concurrent clients will open and query the database. With this flag, you can simulate different database users connecting to the database and executing queries. Each of the clients is independent and executes queries as fast as possible. They share a total pool of queries that were generated by the --single-threaded-runtime-sec. This means the total number of queries that need to be executed is shared between a specified number of workers.

  • --warm-up - The warm-up flag can take a three different arguments, cold, hot and vulcanic. Cold is the default. There is no warm-up being executed, hot will execute some predefined queries before the benchmark, while vulcanic will run the whole workload first before taking measurements. Here is the implementation of warm-up

How to compare results

Once the benchmark has been run, the results are saved in a file specified by --export-results argument. You can use the results files and compare them against other vendor results via the compare_results.py script:

python compare_results.py --compare path_to/run_1.json path_to/run_2.json --output run_1_vs_run_2.html --different-vendors

The output is an HTML file with a visual representation of the performance differences between two compared vendors. The first passed summary JSON file is the reference point. Feel free to open an HTML file in any browser at hand.

Customizing workload generator

How to run the same workload on the different vendors

The base Workload class has benchmarking context information that contains all benchmark arguments used in this run. Some are mentioned above. The key argument here is the --vendor-name, which defines what database is being used in this benchmark.

During the creation of your workload, you can access the parent class property by using self.benchmark_context.vendor_name. For example, if you want to specify special index creation for each vendor, the indexes_generator() could look like this:

 def indexes_generator(self):
        indexes = []
        if "neo4j" in self.benchmark_context.vendor_name:
            indexes.extend(
                [
                    ("CREATE INDEX FOR (n:NodeA) ON (n.id);", {}),
                    ("CREATE INDEX FOR (n:NodeB) ON (n.id);", {}),
                ]
            )
        else:
            indexes.extend(
                [
                    ("CREATE INDEX ON :NodeA(id);", {}),
                    ("CREATE INDEX ON :NodeB(id);", {}),
                ]
            )
        return indexes

The same applies to the dataset_generator(). During the generation of the dataset, you can use special types of queries for different vendors.