Remove docs/user_technical
Summary: The new source of truth is because content writers and community members will write most of the content. Reviewers: ipaljak, teon.banek Reviewed By: ipaljak Subscribers: pullbot Differential Revision:
This commit is contained in:
@ -1,41 +0,0 @@
# Technical Documentation
## About Memgraph
Memgraph is an ACID compliant high performance transactional in-memory graph
database management system featuring highly concurrent
data structures, multi-version concurrency control and asynchronous IO.
[//]: # (When adding a new documentation file, please add it to the list)
## Contents
* [About Memgraph](#about-memgraph)
* [Quick Start](
* [Tutorials](tutorials/
* [Analysing TED Talks](tutorials/
* [Graphing the Premier League](tutorials/
* [Exploring the European Road Network](tutorials/
* [How to](how_to_guides/
* [Import Data?](how_to_guides/
* [Query Memgraph Programmatically?](how_to_guides/
* [Ingest Data Using Kafka?](how_to_guides/
* [Manage User Privileges?](how_to_guides/
* [Concepts](concepts/
* [Storage](concepts/
* [Graph Algorithms](concepts/
* [Indexing](concepts/
* [Reference Guide](reference_guide/
* [Reading Existing Data](reference_guide/
* [Writing New Data](reference_guide/
* [Reading and Writing](reference_guide/
* [Indexing](reference_guide/
* [Graph Algorithms](reference_guide/
* [Graph Streams](reference_guide/
* [Security](reference_guide/
* [Dynamic Graph Partitioner](reference_guide/
* [Other Features](reference_guide/
* [Differences](reference_guide/
* [Upcoming Features](
[//]: # (Nothing should go below the contents section)
@ -1,11 +0,0 @@
## Concepts Overview
Articles within the concepts section serve as an in-depth introduction into
inner workings of Memgraph. These tend to be quite technical in nature and
are recommended for advanced users and other graph database enthusiasts.
So far we have covered the following topics:
* [Data Storage](
* [Graph Algorithms](
* [Indexing](
@ -1,106 +0,0 @@
## Durability and Data Recovery
*Memgraph* uses two mechanisms to ensure the durability of the stored data:
* write-ahead logging (WAL) and
* taking periodic snapshots.
Write-ahead logging works by logging all database modifications to a file.
This ensures that all operations are done atomically and provides a trace of
steps needed to reconstruct the database state.
Snapshots are taken periodically during the entire runtime of *Memgraph*. When
a snapshot is triggered, the whole data storage is written to disk. The
snapshot file provides a quicker way to restore the database state.
Database recovery is done on startup from the most recently found snapshot
file. Since the snapshot may be older than the most recent update logged in
the WAL file, the recovery process will apply the remaining state changes
found in the said WAL file.
NOTE: Snapshot and WAL files are not (currently) compatible between *Memgraph*
Behaviour of the above mechanisms can be tweaked in the configuration file,
usually found in `/etc/memgraph/memgraph.conf`.
In addition to the above mentioned data durability and recovery, a
snapshot file may be generated using *Memgraph's* import tools. For more
information, take a look at the [Import Tools](../how_to_guides/
## Storable Data Types
Since *Memgraph* is a *graph* database management system, data is stored in
the form of graph elements: nodes and edges. Each graph element can also
contain various types of data. This chapter describes which data types are
supported in *Memgraph*.
### Node Labels & Edge Types
Each node can have any number of labels. A label is a text value, which can be
used to *label* or group nodes according to users' desires. A user can change
labels at any time. Similarly to labels, each edge can have a type,
represented as text. Unlike nodes, which can have multiple labels or none at
all, edges *must* have exactly one edge type. Another difference to labels, is
that the edge types are set upon creation and never modified again.
### Properties
Nodes and edges can store various properties. These are like mappings or
tables containing property names and their accompanying values. Property names
are represented as text, while values can be of different types. Each property
name can store a single value, it is not possible to have multiple properties
with the same name on a single graph element. Naturally, the same property
names can be found across multiple graph elements. Also, there are no
restrictions on the number of properties that can be stored in a single graph
element. The only restriction is that the values must be of the supported
types. Following is a table of supported data types.
Type | Description
`Null` | Denotes that the property has no value. This is the same as if the property does not exist.
`String` | A character string, i.e. text.
`Boolean` | A boolean value, either `true` or `false`.
`Integer` | An integer number.
`Float` | A floating-point number, i.e. a real number.
`List` | A list containing any number of property values of any supported type. It can be used to store multiple values under a single property name.
`Map` | A mapping of string keys to values of any supported type.
Note that even though it's possible to store `List` and `Map` property values, it is not possible to modify them. It is however possible to replace them completely. So, the following queries are legal:
CREATE (:Node {property: [1, 2, 3]})
CREATE (:Node {property: {key: "value"}})
However, these queries are not:
MATCH (n:Node) SET[0] = 0
MATCH (n:Node) SET = "other value"
### Cold data on disk
Although *Memgraph* is an in-memory database by default, it offers an option
to store a certain amount of data on disk. More precisely, the user can pass
a list of properties they wish to keep stored on disk via the command line.
In certain cases, this might result in a significant performance boost due to
reduced memory usage. It is recommended to use this feature on large,
cold properties, i.e. properties that are rarely accessed.
For example, a user of a library database might identify author biographies
and book summaries as cold properties. In that case, the user should run
*Memgraph* as follows:
/usr/lib/memgraph/memgraph --properties-on-disk biography,summary
Note that the usage of *Memgraph* has not changed, i.e. durability and
data recovery mechanisms are still in place and the query language remains
the same. It is also important to note that the user cannot change the storage
location of a property while *Memgraph* is running. Naturally, the user can
reload their database from snapshot, provide a different list of properties on
disk and rest assured that only those properties will be stored on disk.
@ -1,156 +0,0 @@
## Graph Algorithms
### Introduction
The graph is a mathematical structure used to describe a set of objects in which
some pairs of objects are "related" in some sense. Generally, we consider
those objects as abstractions named `nodes` (also called `vertices`).
Aforementioned relations between nodes are modelled by an abstraction named
`edge` (also called `relationship`).
It turns out that a lot of real-world problems can be successfully modeled
using graphs. Some natural examples would contain railway networks between
cities, computer networks, piping systems and Memgraph itself.
This article outlines some of the most important graph algorithms
that are internally used by Memgraph. We believe that advanced users could
significantly benefit from obtaining basic knowledge about those algorithms.
The users should also note that this article does not contain an in-depth
analysis of algorithms and their implementation details since those are
well documented in the appropriate literature and, in our opinion, go well out
of scope for user documentation. That being said, we will include the relevant
information for using Memgraph effectively and efficiently.
Contents of this article include:
* [Breadth First Search (BFS)](#breadth-first-search)
* [Weighted Shortest Path (WSP)](#weighted-shortest-path)
### Breadth First Search
[Breadth First Search](
is a way of traversing a graph data structure. The
traversal starts from a single node (usually referred to as source node) and,
during the traversal, breadth is prioritized over depth, hence the name of the
algorithm. More precisely, when we visit some node, we can safely assume that
we have already visited all nodes that are fewer edges away from a source node.
An interesting side-effect of traversing a graph in BFS order is the fact
that, when we visit a particular node, we can easily find a path from
the source node to the newly visited node with the least number of edges.
Since in this context we disregard the edge weights, we can say that BFS is
a solution to an unweighted shortest path problem.
The algorithm itself proceeds as follows:
* Keep around a set of nodes that are equidistant from the source node.
Initially, this set contains only the source node.
* Expand to all not yet visited nodes that are a single edge away from that
set. Note that the set of those nodes is also equidistant from the source
* Replace the set with a set of nodes obtained in the previous step.
* Terminate the algorithm when the set is empty.
The order of visited nodes is nicely visualized in the following animation from
Wikipedia. Note that each row contains nodes that are equidistant from the
source and thus represents one of the sets mentioned above.

The standard BFS implementation skews from the above description by relying on
a FIFO (first in, first out) queue data structure. Nevertheless, the
functionality is equivalent and its runtime is bounded by `O(|V| + |E|)` where
`V` denotes the set of nodes and `E` denotes the set of edges. Therefore,
it provides a more efficient way of finding unweighted shortest paths than
running [Dijkstra's algorithm](#weighted-shortest-path) on a graph
with edge weights equal to `1`.
### Weighted Shortest Path
In [graph theory](, weighted shortest
path problem is the problem of finding a path between two nodes in a graph such
that the sum of the weights of edges connecting nodes on the path is minimized.
#### Dijkstra's algorithm
One of the most important algorithms for finding weighted shortest paths is
[Dijkstra's algorithm](
Our implementation uses a modified version of this algorithm that can handle
length restriction. The length restriction parameter is optional and when it's
not set it could increase the complexity of the algorithm. It is important to
note that the term "length" in this context denotes the number of traversed
edges and not the sum of their weights.
The algorithm itself is based on a couple of greedy observations and could
be expressed in natural language as follows:
* Keep around a set of already visited nodes along with their corresponding
shortest paths from source node. Initially, this set contains only the
source node with the shortest distance of `0`.
* Find an edge that goes from a visited node to an unvisited one such that the
shortest path from source to the visited node increased by the weight of
that edge is minimized. Traverse that edge and add a newly visited node with
appropriate distance to the set of already visited nodes.
* Repeat the process until the destination node is visited.
The described algorithm is nicely visualized in the following animation from
Wikipedia. Note that edge weights correspond to the Euclidean distance between
nodes which represent points on a plane.

Using appropriate data structures the worst-case performance of our
implementation can be expressed as `O(|E| + |V|log|V|)` where `E` denotes
a set of edges and `V` denotes the set of nodes.
A sample query that finds a shortest path between two nodes looks as follows:
MATCH (a {id: 723})-[edge_list *wShortest 10 (e, n | e.weight) total_weight]-(b {id: 882}) RETURN *
This query has an upper bound length restriction set to `10`. This means that no
path that traverses more than `10` edges will be considered as a valid result.
##### Upper Bound Implications
Since the upper bound parameter is optional, we can have different results based
on this parameter.
Consider the following graph and sample queries.

MATCH (a {id: 0})-[edge_list *wShortest 3 (e, n | e.weight) total_weight]-(b {id: 5}) RETURN *
MATCH (a {id: 0})-[edge_list *wShortest (e, n | e.weight) total_weight]-(b {id: 5}) RETURN *
The first query will try to find the weighted shortest path between nodes `0`
and `5` with the restriction on the path length set to `3`, and the second query
will try to find the weighted shortest path with no restriction on the path
The expected result for the first query is `0 -> 1 -> 4 -> 5` with the total
cost of `12`, while the expected result for the second query is
`0 -> 2 -> 3 -> 4 -> 5` with the total cost of `11`. Obviously, the second
query can find the true shortest path because it has no restrictions on the
To handle cases when the length restriction is set, *weighted shortest path*
algorithm uses both node and distance as the state. This causes the search
space to increase by the factor of the given upper bound. On the other hand, not
setting the upper bound parameter, the search space might contain the whole
Because of this, one should always try to narrow down the upper bound limit to
be as precise as possible in order to have a more performant query.
### Where to next?
For some real-world application of WSP we encourage you to visit our article
on [exploring the European road network](../tutorials/
which was specially crafted to showcase our graph algorithms.
@ -1,92 +0,0 @@
## Indexing
### Introduction
A database index is a data structure used to improve the speed of data retrieval
within a database at the cost of additional writes and storage space for
maintaining the index data structure.
Armed with deep understanding of their data model and use-case, users can decide
which data to index and, by doing so, significantly improve their data retrieval
### Index Types
At Memgraph, we support two types of indexes:
* label index
* label-property index
Label indexing is enabled by default in Memgraph, i.e., Memgraph automatically
indexes labeled data. By doing so we optimize queries which fetch nodes by
MATCH (n: Label) ... RETURN n
Indexes can also be created on data with a specific combination of label and
property, hence the name label-property index. This operation needs to be
specified by the user and should be used with a specific data model and
use-case in mind.
For example, suppose we are storing information about certain people in our
database and we are often interested in retrieving their age. In that case,
it might be beneficial to create an index on nodes labeled as `:Person` which
have a property named `age`. We can do so by using the following language
CREATE INDEX ON :Person(age)
After the creation of that index, those queries will be more efficient due to
the fact that Memgraph's query engine will not have to fetch each `:Person` node
and check whether the property exists. Moreover, even if all nodes labeled as
`:Person` had an `age` property, creating such index might still prove to be
beneficial. The main reason is that entries within that index are kept sorted
by property value. Queries such as the following are therefore more efficient:
MATCH (n :Person {age: 42}) RETURN n
Index based retrieval can also be invoked on queries with `WHERE` statements.
For instance, the following query will have the same effect as the previous
MATCH (n) WHERE n:Person AND n.age = 42 RETURN n
Naturally, indexes will also be used when filtering based on less than or
greater than comparisons. For example, filtering all minors (persons
under 18 years of age under Croatian law) using the following query will use
index based retrieval:
MATCH (n) WHERE n:PERSON and n.age < 18 RETURN n
Bear in mind that `WHERE` filters could contain arbitrarily complex expressions
and index based retrieval might not be used. Nevertheless, we are continually
improving our index usage recognition algorithms.
### Underlying Implementation
The central part of our index data structure is a highly-concurrent skip list.
Skip lists are probabilistic data structures that allow fast search within an
ordered sequence of elements. The structure itself is built in layers where the
bottom layer is an ordinary linked list that preserves the order. Each higher
level can be imagined as a highway for layers below.
The implementation details behind skip list operations are well documented
in the literature and are out of scope for this article. Nevertheless, we
believe that it is important for more advanced users to understand the following
implications of this data structure (`n` denotes the current number of elements
in a skip list):
* Average insertion time is `O(log(n))`
* Average deletion time is `O(log(n))`
* Average search time is `O(log(n))`
* Average memory consumption is `O(n)`
Binary file not shown.
Before Width: | Height: | Size: 12 KiB |
@ -1,12 +0,0 @@
## How-to Guides Overview
Articles within the how-to guides section serve as a cookbook for getting
things done as fast as possible. These articles tend to provide a step-by-step
guide on how to use certain Memgraph feature or solve a particular problem.
So far we have covered the following topics:
* [How to Import Data?](
* [How to Query Memgraph Programmatically?](
* [How to Ingest Data Using Kafka](
* [How to Manage User Privileges](
@ -1,118 +0,0 @@
## How to Import Data?
Memgraph comes with tools for importing data into the database. Currently,
only import of CSV formatted is supported. We plan to support more formats in
the future.
### CSV Import Tool
CSV data should be in Neo4j CSV compatible format. Detailed format
specification can be found
The import tool is run from the console, using the `mg_import_csv` command.
If you installed Memgraph using Docker, you will need to run the importer
using the following command:
docker run -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -v mg_import:/import-data \
--entrypoint=mg_import_csv memgraph
You can pass CSV files containing node data using the `--nodes` option.
Multiple files can be specified by repeating the `--nodes` option. At least
one node file should be specified. Similarly, graph edges (also known as
relationships) are passed via the `--relationships` option. Multiple
relationship files are imported by repeating the option. Unlike nodes,
relationships are not required.
After reading the CSV files, the tool will by default search for the installed
Memgraph configuration. If the configuration is found, the data will be
written in the configured durability directory. If the configuration isn't
found, you will need to use the `--out` option to specify the output file. You
can use the same option to override the default behaviour.
Memgraph will recover the imported data on the next startup by looking in the
durability directory.
For information on other options, run:
mg_import_csv --help
When using Docker, this translates to:
docker run --entrypoint=mg_import_csv memgraph --help
#### Example
Let's import a simple dataset.
Store the following in `comment_nodes.csv`.
1,United Kingdom,Chrome,thanks,Message;Comment
3,France,Firefox,I see,Message;Comment
4,Italy,Internet Explorer,fine,Message;Comment
Now, let's add `forum_nodes.csv`.
And finally, set relationships between comments and forums in
Now, you can import the dataset in Memgraph.
WARNING: Your existing recovery data will be considered obsolete, and Memgraph
will load the new dataset.
Use the following command:
mg_import_csv --overwrite --nodes=comment_nodes.csv --nodes=forum_nodes.csv --relationships=relationships.csv
If using Docker, things are a bit more complicated. First you need to move the
CSV files where the Docker image can see them:
mkdir -p /var/lib/docker/volumes/mg_import/_data
cp comment_nodes.csv forum_nodes.csv relationships.csv /var/lib/docker/volumes/mg_import/_data
Then, run the importer with the following:
docker run -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -v mg_import:/import-data \
--entrypoint=mg_import_csv memgraph \
--overwrite \
--nodes=/import-data/comment_nodes.csv --nodes=/import-data/forum_nodes.csv \
Next time you run Memgraph, the dataset will be loaded.
@ -1,213 +0,0 @@
## How to Query Memgraph Programmatically?
### Supported Languages
If users wish to query Memgraph programmatically, they can do so using the
[Bolt protocol]( Bolt was designed for efficient
communication with graph databases and Memgraph supports
[Version 1]( of the protocol. Bolt protocol drivers
for some popular programming languages are listed below:
* [Java](
* [Python](
* [JavaScript](
* [C#](
* [Ruby](
* [Haskell](
* [PHP](
### Secure Sockets Layer (SSL)
Secure connections are supported and enabled by default. The server initially
ships with a self-signed testing certificate. The certificate can be replaced
by editing the following parameters in `/etc/memgraph/memgraph.conf`:
To disable SSL support and use insecure connections to the database you should
set both parameters (`--cert-file` and `--key-file`) to empty values.
### Examples
In this article we have included some basic usage examples for the following
supported languages:
* [Python](#python-example)
* [Java](#java-example)
* [JavaScript](#javascript-example)
* [C#](#c-sharp-example)
Examples for the languages listed above are equivalent.
#### Python Example
Neo4j officially supports Python for interacting with an openCypher and Bolt
compliant database. For details consult the
[official documentation]( and the
[GitHub project](
The code snippet below outlines a basic usage example which connects to the
database and executes a couple of elementary queries.
from neo4j.v1 import GraphDatabase, basic_auth
# Initialize and configure the driver.
# * provide the correct URL where Memgraph is reachable;
# * use an empty user name and password.
driver = GraphDatabase.driver("bolt://localhost:7687",
auth=basic_auth("", ""))
# Start a session in which queries are executed.
session = driver.session()
# Execute openCypher queries.
# After each query, call either `consume()` or `data()`
||||'CREATE (alice:Person {name: "Alice", age: 22})').consume()
# Get all the vertices from the database (potentially multiple rows).
vertices ='MATCH (n) RETURN n').data()
# Assuming we started with an empty database, we should have Alice
# as the only row in the results.
only_row = vertices.pop()
alice = only_row["n"]
# Print out what we retrieved.
print("Found a vertex with labels '{}', name '{}' and age {}".format(
alice['name'], alice.labels, alice['age'])
# Remove all the data from the database.
||||'MATCH (n) DETACH DELETE n').consume()
# Close the session and the driver.
#### Java Example
The details about Java driver can be found on
The code snippet below outlines a basic usage example which connects to the
database and executes a couple of elementary queries.
import org.neo4j.driver.v1.*;
import org.neo4j.driver.v1.types.*;
import static org.neo4j.driver.v1.Values.parameters;
import java.util.*;
public class JavaQuickStart {
public static void main(String[] args) {
// Initialize driver.
Config config =;
Driver driver = GraphDatabase.driver("bolt://localhost:7687",
// Execute basic queries.
try (Session session = driver.session()) {
StatementResult rs1 ="MATCH (n) DETACH DELETE n");
StatementResult rs2 =
"CREATE (alice: Person {name: 'Alice', age: 22})");
StatementResult rs3 = "MATCH (n) RETURN n");
List<Record> records = rs3.list();
Record record = records.get(0);
Node node = record.get("n").asNode();
} catch (Exception e) {
// Cleanup.
#### JavaScript Example
The details about Javascript driver can be found on
Here is an example related to `Node.js`. Memgraph doesn't have integrated
support for `WebSocket` which is required during the execution in any web
browser. If you want to run `openCypher` queries from a web browser,
[websockify]( has to be up and running.
Requests from web browsers are wrapped into `WebSocket` messages, and a proxy
is needed to handle the overhead. The proxy has to be configured to point out
to Memgraph's Bolt port and web browser driver has to send requests to the
proxy port.
The code snippet below outlines a basic usage example which connects to the
database and executes a couple of elementary queries.
var neo4j = require('neo4j-driver').v1;
var driver = neo4j.driver("bolt://localhost:7687",
neo4j.auth.basic("neo4j", "1234"));
var session = driver.session();
function die() {
function run_query(query, callback) {
var run =, {});
run.then(callback).catch(function (error) {
run_query("MATCH (n) DETACH DELETE n", function (result) {
console.log("Database cleared.");
run_query("CREATE (alice: Person {name: 'Alice', age: 22})", function (result) {
console.log("Record created.");
run_query("MATCH (n) RETURN n", function (result) {
console.log("Record matched.");
var alice = result.records[0].get("n");
#### C# Example {#c-sharp-example}
The details about C# driver can be found on
The code snipped below outlines a basic usage example which connects to the
database and executes a couple of elementary queries.
using System;
using System.Linq;
using Neo4j.Driver.V1;
public class Basic {
public static void Main(string[] args) {
// Initialize the driver.
var config = Config.DefaultConfig;
using(var driver = GraphDatabase.Driver("bolt://localhost:7687", AuthTokens.None, config))
using(var session = driver.Session())
// Run basic queries.
session.Run("MATCH (n) DETACH DELETE n").Consume();
session.Run("CREATE (alice:Person {name: \"Alice\", age: 22})").Consume();
var result = session.Run("MATCH (n) RETURN n").First();
var alice = (INode) result["n"];
Console.WriteLine(string.Join(", ", alice.Labels));
Console.WriteLine("All ok!");
@ -1,116 +0,0 @@
## How to Ingest Data Using Kafka
Apache Kafka is an open-source stream-processing software platform. The project
aims to provide a unified, high-throughput, low-latency platform for handling
real-time data feeds.
Memgraph offers easy data import at the source using Kafka as the
high-throughput messaging system.
At this point, we strongly advise you to read the streaming section of our
[reference guide](../reference_guide/
In this article, we assume you have a local instance of Kafka. You can find
more about running Kafka [here](
From this point forth, we assume you have a instance of Kafka running on
`localhost:9092` with a topic `test` and that you've started Memgraph and have
Memgraph client running.
Each Kafka stream in Memgraph requires a transform script written in `Python`
that knows how to interpret incoming data and transform the data to queries that
Memgraph understands. Lets assume you have script available on
Lets also assume the Kafka topic contains two types of messages:
* Node creation: the message contains a single number, the node id.
* Edge creation: the message contains two numbers, origin node id and
destination node id.
In order to create a stream input the following query in the client:
CREATE STREAM mystream AS LOAD DATA KAFKA 'localhost:9092' WITH TOPIC 'test' WITH
TRANSFORM 'http://localhost/'
This will create the stream inside Memgraph but will not start it yet. However,
if the Kafka instance isn't available on the given URI, or the topic doesn't
exist, the query will fail with an appropriate message.
E.g. if the transform script can't be found at the given URI, the following
error will be shown:
Client received exception: Couldn't get the transform script from http://localhost/
Similarly, if the given Kafka topic doesn't exist, we'll get the following:
Client received exception: Kafka stream mystream, topic not found
After a successful stream creation, you can check the status of all streams by
This should produce the following output:
| name | uri | topic | transform | status |
| mystream | localhost:9092 | test | http://localhost/ | stopped |
As you can notice, the status of this stream is stopped.
In order to see if everything is correct, you can test the stream by executing:
TEST STREAM mystream;
This will ingest data from Kafka, but instead of writing it to Memgraph, it will
just output the result.
If the `test` Kafka topic would contain two messages, `1` and `1 2` the result
of the `TEST STREAM` query would look like:
| query | params |
| CREATE (:Node {id: $id}) | {id:"1"} |
| MATCH (n:Node {id: $from_id}), (m:Node {id: $to_id}) CREATE (n)-[:Edge]->(m) | {from_id:"1",to_id:"2"} |
To start ingesting data from a stream, you need to execute the following query:
START STREAM mystream;
If we check the stream status now, the output would look like this:
| name | uri | topic | transform | status |
| mystream | localhost:9092 | test | http://localhost/ | running |
To stop ingesting data, the stop stream query needs to be executed:
STOP STREAM mystream;
If Memgraph shuts down, all streams that existed before the shutdown are going
to be recovered.
@ -1,142 +0,0 @@
## How to Manage User Privileges?
Most databases have multiple users accessing and modifying
data within the database. This might pose a serious security concern for the
system administrators that wish to grant only certain privileges to certain
users. A typical example would be an internal database of some company which
tracks data about their employees. Naturally, only certain users of the database
should be able to perform queries which modify that data.
At Memgraph, we provide the administrators with the option of granting,
denying or revoking a certain set of privileges to some users or groups of users
(i.e. users that are assigned a specific user role), thereby eliminating such
security concerns.
By default, anyone can connect to Memgraph and is granted all privileges.
After the first user is created, Memgraph will execute a query if and only
if either a user or its role is granted that privilege and neither the
user nor its role are denied that privilege. Otherwise, Memgraph will not
execute that specific query. Note that `DENY` is a stronger
operation than `GRANT`. This is also notable from the fact that if neither the
user nor its role are explicitly granted or denied a certain privilege, that
user will not be able to perform that specific query. This effect also is known
as a silent deny. The information above is neatly condensed in the following
User Status | Role Status | Effective Status
All supported commands that deal with accessing or modifying users, user
roles and privileges can only be executed by users that are granted the
`AUTH` privilege. All of those commands are listed in the appropriate
[reference guide](../reference_guide/
At the moment, privileges are confined to users' abilities to perform certain
`OpenCypher` queries. Namely users can be given permission to execute a subset
of the following commands: `CREATE`, `DELETE`, `MATCH`, `MERGE`, `SET`,
We could naturally cluster those privileges into groups:
* Privilege to access data (`MATCH`)
* Privilege to modify data (`MERGE`, `SET`)
* Privilege to create and delete data (`CREATE`, `DELETE`, `REMOVE`)
* Privilege to index data (`INDEX`)
* Privilege to use data streaming (`STREAM`)
* Privilege to view and alter users, roles and privileges (`AUTH`)
If you are unfamiliar with any of these commands, you can look them up in our
[reference guide](../reference_guide/
Similarly, the complete list of commands which can be executed under `AUTH`
privilege can be viewed in the
[appropriate article](../reference_guide/ within our reference
The remainder of this article outlines a recommended workflow of
user management within an internal database of a fictitious company.
### Creating an Administrator
As it was stated in the introduction, after the first user is created, Memgraph
will execute a query for a given user if the effective status of a corresponding
privilege evaluates to `GRANT`. As a corollary, the person that created the
first user might not be able to perform any meaningful action after their
session had ended. To prevent that from happening, we strongly recommend
the first created user to be an administrator which is granted all privileges.
Therefore, let's create a user named `admin` and set its' password to `0000`.
This can be done by executing:
Granting all privileges to our `admin` user can be done as follows:
At this point, the current user can close their session and log into a new
one as an `admin` user they have just created. The remainder of the article
is written from the viewpoint of an administrator which is granted
all privileges.
### Creating Other Users
Our fictitious company is internally divided into teams, and each team has
its own supervisor. All employees of the company need to access and modify
data within the database.
Creating a user account for a new hire named Alice can be done as follows:
Alice should also be granted a privilege to access data, which can be done by
executing the following:
### Creating User Roles
Each team supervisor needs to have additional privileges that allow them to
create new data or delete existing data from the database. Instead of tediously
granting additional privileges to each supervisor using language constructs from
the previous chapter, we could do so by creating a new user role for
Creating a user role named `supervisor` can be done by executing the following
CREATE ROLE supervisor;
Granting the privilege to create and delete data to our newly created role can
be done as follows:
Finally, we need to assign that role to each of the supervisors. Suppose, a user
named `bob` is indeed a supervisor within the company. Assigning them that role
within the database can be done by the following command:
SET ROLE FOR bob TO supervisor;
@ -1,290 +0,0 @@
## Quick Start
This article briefly outlines the basic steps necessary to install and run
Memgraph. It also gives a brief glimpse into the world of OpenCypher and
outlines some information on programmatic querying of Memgraph. The users
should also make sure to read and fully understand the implications of
[telemetry](#telemetry) at the very end of the article.
### Installation
With regards to their own preference, users can download the Memgraph binary
* [a Debian package for Debian 9 (Stretch)](#debian-installation)
* [a RPM package for CentOS 7](#RPM-installation)
* [a Docker image](#docker-installation)
After downloading the binary, users are advised to proceed to the corresponding
section below which outlines the installation details.
It is important to note that newer versions of Memgraph are currently not
backward compatible with older versions. This is mainly noticeable by
being unable to load storage snapshots between different versions.
#### Debian Package Installation {#debian-installation}
After downloading Memgraph as a Debian package, install it by running the
dpkg -i /path/to/memgraph_<version>.deb
On successful installation, Memgraph should already be running. To
make sure that is true, user can start it explicitly with the command:
systemctl start memgraph
To verify that Memgraph is running, user can run the following command:
journalctl --unit memgraph
If successful, the user should receive an output similar to the following:
Nov 23 13:40:13 hostname memgraph[14654]: Starting 8 BoltS workers
Nov 23 13:40:13 hostname memgraph[14654]: BoltS server is fully armed and operational
Nov 23 13:40:13 hostname memgraph[14654]: BoltS listening on at 7687
At this point, Memgraph is ready to process queries. To try out some elementary
queries, the user should proceed to [querying](#querying) section of this
To shut down the Memgraph server, issue the following command:
systemctl stop memgraph
Memgraph configuration is available in `/etc/memgraph/memgraph.conf`. If the
configuration is altered, Memgraph needs to be restarted.
#### RPM Package Installation {#RPM-installation}
After downloading the RPM package of Memgraph, the user can install it by
issuing the following command:
rpm -U /path/to/memgraph-<version>.rpm
After the successful installation, Memgraph can be started as a service. To do
so, the user can type the following command:
systemctl start memgraph
To verify that Memgraph is running, the user should run the following command:
journalctl --unit memgraph
If successful, the user should receive an output similar to the following:
Nov 23 13:40:13 hostname memgraph[14654]: Starting 8 BoltS workers
Nov 23 13:40:13 hostname memgraph[14654]: BoltS server is fully armed and operational
Nov 23 13:40:13 hostname memgraph[14654]: BoltS listening on at 7687
At this point, Memgraph is ready to process queries. To try out some elementary
queries, the user should proceed to [querying](#querying) section of this
To shut down the Memgraph server, issue the following command:
systemctl stop memgraph
Memgraph configuration is available in `/etc/memgraph/memgraph.conf`. If the
configuration is altered, Memgraph needs to be restarted.
#### Docker Installation {#docker-installation}
Before proceeding with the installation, the user should install the Docker
engine on their system. Instructions on how to install Docker can be found on
the [official Docker website](
Memgraph's Docker image was built with Docker version `1.12` and should be
compatible with all newer versions.
After successful Docker installation, the user should install the Memgraph
Docker image and import it using the following command:
docker load -i /path/to/memgraph-<version>-docker.tar.gz
To actually start Memgraph, the user should issue the following command:
docker run -p 7687:7687 \
-v mg_lib:/var/lib/memgraph -v mg_log:/var/log/memgraph -v mg_etc:/etc/memgraph \
If successful, the user should be greeted with the following message:
Starting 8 workers
Server is fully armed and operational
Listening on at 7687
At this point, Memgraph is ready to process queries. To try out some elementary
queries, the user should proceed to [querying](#querying) section of this
To stop Memgraph, press `Ctrl-c`.
#### Note about named volumes
Memgraph configuration is available in Docker's named volume `mg_etc`. On
Linux systems it should be in
`/var/lib/docker/volumes/mg_etc/_data/memgraph.conf`. After changing the
configuration, Memgraph needs to be restarted.
If it happens that the named volumes are reused between different Memgraph
versions, Docker will overwrite a folder within the container with existing
data from the host machine. If a new file is introduced, or two versions of
Memgraph are not compatible, some features might not work or Memgraph might
not be able to work correctly. We strongly advise the users to use another
named volume for a different Memgraph version or to remove the existing volume
from the host with the following command:
docker volume rm <volume_name>
#### Note for OS X/macOS Users {#OSX-note}
Although unlikely, some OS X/macOS users might experience minor difficulties
after following the Docker installation instructions. Instead of running on
`localhost`, a Docker container for Memgraph might be running on a custom IP
address. Fortunately, that IP address can be found using the following
1) Find out the container ID of the Memgraph container
By issuing the command `docker ps` the user should get an output similar to the
9397623cd87e memgraph "/usr/lib/memgraph/m…" 2 seconds ago ...
At this point, it is important to remember the container ID of the Memgraph
image. In our case, that is `9397623cd87e`.
2) Use the container ID to retrieve an IP of the container
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' 9397623cd87e
The command above should yield the sought IP. If that IP does not correspond to
`localhost`, it should be used instead of `localhost` when firing up the
`neo4j-client` in the [querying](#querying) section.
### Querying {#querying}
Memgraph supports the openCypher query language which has been developed by
[Neo4j]( It is a declarative language developed specifically
for interaction with graph databases which is currently going through a
vendor-independent standardization process.
The easiest way to execute openCypher queries against Memgraph is by using
Neo4j's command-line tool. The command-line `neo4j-client` can be installed as
described [on the official website](
After installing `neo4j-client`, the user can connect to the running Memgraph
instance by issuing the following shell command:
neo4j-client -u "" -p "" localhost 7687
After the client has started it should present a command prompt similar to:
neo4j-client 2.1.3
Enter `:help` for usage hints.
Connected to 'neo4j://@localhost:7687'
At this point it is possible to execute openCypher queries on Memgraph. Each
query needs to end with the `;` (*semicolon*) character. For example:
CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"});
The above will create 2 nodes in the database, one labeled "User" with name
"Alice" and the other labeled "Software" with name "Memgraph". It will also
create a relationship that "Alice" *likes* "Memgraph".
To find created nodes and relationships, execute the following query:
MATCH (u:User)-[r]->(x) RETURN u, r, x;
#### Supported Languages
If users wish to query Memgraph programmatically, they can do so using the
[Bolt protocol]( Bolt was designed for efficient
communication with graph databases and Memgraph supports
[Version 1]( of the protocol. Bolt protocol drivers
for some popular programming languages are listed below:
* [Java](
* [Python](
* [JavaScript](
* [C#](
* [Ruby](
* [Haskell](
* [PHP](
We have included some basic usage examples for some of the supported languages
in the article about [programmatic querying](how_to_guides/
### Telemetry {#telemetry}
Telemetry is an automated process by which some useful data is collected at
a remote point. At Memgraph, we use telemetry for the sole purpose of improving
our product, thereby collecting some data about the machine that executes the
database (CPU, memory, OS and kernel information) as well as some data about the
database runtime (CPU usage, memory usage, vertices and edges count).
Here at Memgraph, we deeply care about the privacy of our users and do not
collect any sensitive information. If users wish to disable Memgraph's telemetry
features, they can easily do so by either altering the line in
`/etc/memgraph/memgraph.conf` that enables telemetry (`--telemetry-enabled=true`)
into `--telemetry-enabled=false`, or by including the `--telemetry-enabled=false`
as a command-line argument when running the executable.
### Where to Next
To learn more about the openCypher language, the user should visit our
[reference guide](reference_guide/ article.
For real-world examples of how to use Memgraph, we strongly suggest reading
through the following articles:
* [Analyzing TED Talks](tutorials/
* [Graphing the Premier League](tutorials/
* [Exploring the European Road Network](tutorials/
Details on what can be stored in Memgraph can be found in the article about
[Data Storage](concepts/
We *welcome and encourage* your feedback!
@ -1,22 +0,0 @@
## Reference Overview
[*openCypher*]( is a query language for querying
graph databases. It aims to be intuitive and easy to learn, while
providing a powerful interface for working with graph based data.
*Memgraph* supports most of the commonly used constructs of the language. The
reference guide contains the details of implemented features. Additionally,
not yet supported features of the language are listed.
Our reference guide currently consists of the following articles:
* [Reading Existing Data](
* [Writing New Data](
* [Reading and Writing](
* [Indexing](
* [Graph Algorithms](
* [Graph Streams](
* [Security](
* [Dynamic Graph Partitioner](
* [Other Features](
* [Differences](
@ -1,280 +0,0 @@
## Reading Existing Data
The simplest usage of the language is to find data stored in the
database. For that purpose, the following clauses are offered:
* `MATCH`, which searches for patterns;
* `WHERE`, for filtering the matched data and
* `RETURN`, for defining what will be presented to the user in the result
* `UNION` and `UNION ALL` for combining results from multiple queries.
This clause is used to obtain data from Memgraph by matching it to a given
pattern. For example, to find each node in the database, you can use the
following query.
MATCH (node) RETURN node
Finding connected nodes can be achieved by using the query:
MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2
In addition to general pattern matching, you can narrow the search down by
specifying node labels and properties. Similarly, edge types and properties
can also be specified. For example, finding each node labeled as `Person` and
with property `age` being 42, is done with the following query.
MATCH (n :Person {age: 42}) RETURN n
While their friends can be found with the following.
MATCH (n :Person {age: 42})-[:FriendOf]-(friend) RETURN friend
There are cases when a user needs to find data which is connected by
traversing a path of connections, but the user doesn't know how many
connections need to be traversed. openCypher allows for designating patterns
with *variable path lengths*. Matching such a path is achieved by using the
`*` (*asterisk*) symbol inside the edge element of a pattern. For example,
traversing from `node1` to `node2` by following any number of connections in a
single direction can be achieved with:
MATCH (node1)-[r*]->(node2) RETURN node1, r, node2
If paths are very long, finding them could take a long time. To prevent that,
a user can provide the minimum and maximum length of the path. For example,
paths of length between 2 and 4 can be obtained with a query like:
MATCH (node1)-[r*2..4]->(node2) RETURN node1, r, node2
It is possible to name patterns in the query and return the resulting paths.
This is especially useful when matching variable length paths:
MATCH path = ()-[r*2..4]->() RETURN path
More details on how `MATCH` works can be found
The `MATCH` clause can be modified by prepending the `OPTIONAL` keyword.
`OPTIONAL MATCH` clause behaves the same as a regular `MATCH`, but when it
fails to find the pattern, missing parts of the pattern will be filled with
`null` values. Examples can be found
You have already seen that simple filtering can be achieved by using labels
and properties in `MATCH` patterns. When more complex filtering is desired,
you can use `WHERE` paired with `MATCH` or `OPTIONAL MATCH`. For example,
finding each person older than 20 is done with the this query.
MATCH (n :Person) WHERE n.age > 20 RETURN n
Additional examples can be found
The `RETURN` clause defines which data should be included in the resulting
set. Basic usage was already shown in the examples for `MATCH` and `WHERE`
clauses. Another feature of `RETURN` is renaming the results using the `AS`
MATCH (n :Person) RETURN n AS people
That query would display all nodes under the header named `people` instead of
When you want to get everything that was matched, you can use the `*`
(*asterisk*) symbol.
This query:
MATCH (node1)-[connection]-(node2) RETURN *
is equivalent to:
MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2
`RETURN` can be followed by the `DISTINCT` operator, which will remove
duplicate results. For example, getting unique names of people can be achieved
Besides choosing what will be the result and how it will be named, the
`RETURN` clause can also be used to:
* limit results with `LIMIT` sub-clause;
* skip results with `SKIP` sub-clause;
* order results with `ORDER BY` sub-clause and
* perform aggregations (such as `count`).
More details on `RETURN` can be found
These sub-clauses take a number of how many results to skip or limit.
For example, to get the first 3 results you can use this query.
MATCH (n :Person) RETURN n LIMIT 3
If you want to get all the results after the first 3, you can use the
MATCH (n :Person) RETURN n SKIP 3
The `SKIP` and `LIMIT` can be combined. So for example, to get the 2nd result,
you can do:
Since the patterns which are matched can come in any order, it is very useful
to be able to enforce some ordering among the results. In such cases, you can
use the `ORDER BY` sub-clause.
For example, the following query will get all `:Person` nodes and order them
by their names.
By default, ordering will be in the ascending order. To change the order to be
descending, you should append `DESC`.
For example, to order people by their name descending, you can use this query.
You can also order by multiple variables. The results will be sorted by the
first variable listed. If the values are equal, the results are sorted by the
second variable, and so on.
Example. Ordering by first name descending and last name ascending.
MATCH (n :Person) RETURN n ORDER BY DESC, n.lastName
Note that `ORDER BY` sees only the variable names as carried over by `RETURN`.
This means that the following will result in an error.
MATCH (n :Person) RETURN old AS new ORDER BY
Instead, the `new` variable must be used:
MATCH (n: Person) RETURN old AS new ORDER BY
The `ORDER BY` sub-clause may come in handy with `SKIP` and/or `LIMIT`
sub-clauses. For example, to get the oldest person you can use the following.
##### Aggregating
openCypher has functions for aggregating data. Memgraph currently supports
the following aggregating functions.
* `avg`, for calculating the average.
* `collect`, for collecting multiple values into a single list or map. If
given a single expression values are collected into a list. If given two
expressions, values are collected into a map where the first expression
denotes map keys (must be string values) and the second expression denotes
map values.
* `count`, for counting the resulting values.
* `max`, for calculating the maximum result.
* `min`, for calculating the minimum result.
* `sum`, for getting the sum of numeric results.
Example, calculating the average age:
MATCH (n :Person) RETURN avg(n.age) AS averageAge
Collecting items into a list:
MATCH (n :Person) RETURN collect( AS list_of_names
Collecting items into a map:
MATCH (n :Person) RETURN collect(, n.age) AS map_name_to_age
for additional details on how aggregations work.
openCypher supports combining results from multiple queries into a single result
set. That result will contain rows that belong to queries in the union
respecting the union type.
Using `UNION` will contain only distinct rows while `UNION ALL` will keep all
rows from all given queries.
Restrictions when using `UNION` or `UNION ALL`:
* The number and the names of columns returned by queries must be the same
for all of them.
* There can be only one union type between single queries, i.e. a query can't
contain both `UNION` and `UNION ALL`.
Example, get distinct names that are shared between persons and movies:
MATCH(n: Person) RETURN AS name UNION MATCH(n: Movie) RETURN AS name
Example, get all names that are shared between persons and movies (including duplicates):
MATCH(n: Person) RETURN AS name UNION ALL MATCH(n: Movie) RETURN AS name
@ -1,92 +0,0 @@
## Writing New Data
For adding new data, you can use the following clauses.
* `CREATE`, for creating new nodes and edges.
* `SET`, for adding new or updating existing labels and properties.
* `DELETE`, for deleting nodes and edges.
* `REMOVE`, for removing labels and properties.
You can still use the `RETURN` clause to produce results after writing, but it
is not mandatory.
Details on which kind of data can be stored in *Memgraph* can be found in
[Data Storage](../concepts/ chapter.
This clause is used to add new nodes and edges to the database. The creation
is done by providing a pattern, similarly to `MATCH` clause.
For example, to create 2 new nodes connected with a new edge, use this query.
CREATE (node1)-[:edge_type]->(node2)
Labels and properties can be set during creation using the same syntax as in
`MATCH` patterns. For example, creating a node with a label and a
CREATE (node :Label {property: "my property value"})
Additional information on `CREATE` is
### SET
The `SET` clause is used to update labels and properties of already existing
Example. Incrementing everyone's age by 1.
MATCH (n :Person) SET n.age = n.age + 1
for a more detailed explanation on what can be done with `SET`.
This clause is used to delete nodes and edges from the database.
Example. Removing all edges of a single type.
MATCH ()-[edge :type]-() DELETE edge
When testing the database, you want to often have a clean start by deleting
every node and edge in the database. It is reasonable that deleting each node
should delete all edges coming into or out of that node.
MATCH (node) DELETE node
But, openCypher prevents accidental deletion of edges. Therefore, the above
query will report an error. Instead, you need to use the `DETACH` keyword,
which will remove edges from a node you are deleting. The following should
work and *delete everything* in the database.
More examples are
The `REMOVE` clause is used to remove labels and properties from nodes and
MATCH (n :WrongLabel) REMOVE n :WrongLabel,
@ -1,51 +0,0 @@
## Reading and Writing
OpenCypher supports combining multiple reads and writes using the
`WITH` clause. In addition to combining, the `MERGE` clause is provided which
may create patterns if they do not exist.
### WITH
The write part of the query cannot be simply followed by another read part. In
order to combine them, `WITH` clause must be used. The names this clause
establishes are transferred from one part to another.
For example, creating a node and finding all nodes with the same property.
CREATE (node {property: 42}) WITH AS propValue
MATCH (n {property: propValue}) RETURN n
Note that the `node` is not visible after `WITH`, since only ``
was carried over.
This clause behaves very much like `RETURN`, so you should refer to features
of `RETURN`.
The `MERGE` clause is used to ensure that a pattern you are looking for exists
in the database. This means that if the pattern is not found, it will be
created. In a way, this clause is like a combination of `MATCH` and `CREATE`.
Example. Ensure that a person has at least one friend.
MATCH (n :Person) MERGE (n)-[:FriendOf]->(m)
The clause also provides additional features for updating the values depending
on whether the pattern was created or matched. This is achieved with `ON
CREATE` and `ON MATCH` sub clauses.
Example. Set a different properties depending on what `MERGE` did.
MATCH (n :Person) MERGE (n)-[:FriendOf]->(m)
ON CREATE SET m.prop = "created" ON MATCH SET m.prop = "existed"
For more details, click [this
@ -1,56 +0,0 @@
## Indexing
An index stores additional information on certain types of data, so that
retrieving said data becomes more efficient. Downsides of indexing are:
* requiring extra storage for each index and
* slowing down writes to the database.
Carefully choosing which data to index can tremendously improve data retrieval
efficiency, and thus make index downsides negligible.
Memgraph automatically indexes labeled data. This improves queries
which fetch nodes by label:
MATCH (n :Label) ... RETURN n
Indexing can also be applied to data with a specific combination of label and
property. These are not automatically created, instead a user needs to create
them explicitly. Creation is done using a special
`CREATE INDEX ON :Label(property)` language construct.
For example, to index nodes which is labeled as `:Person` and has a property
named `age`:
CREATE INDEX ON :Person(age)
After the index is created, retrieving those nodes will become more efficient.
For example, the following query will retrieve all nodes which have an `age`
property, instead of fetching each `:Person` node and checking whether the
property exists.
MATCH (n :Person {age: 42}) RETURN n
Using index based retrieval also works when filtering labels and properties
with `WHERE`. For example, the same effect as in the previous example can be
done with:
MATCH (n) WHERE n:Person AND n.age = 42 RETURN n
Since the filter inside `WHERE` can contain any kind of an expression, the
expression can be complicated enough so that the index does not get used. We
are continuously improving the recognition of index usage opportunities from a
`WHERE` expression. If there is any suspicion that an index may not be used,
we recommend putting properties and labels inside the `MATCH` pattern.
Currently, once an index is created it cannot be deleted. This feature will be
implemented very soon. The expected syntax for removing an index will be `DROP
INDEX ON :Label(property)`.
@ -1,137 +0,0 @@
## Graph Algorithms
### Filtering Variable Length Paths
OpenCypher supports only simple filtering when matching variable length paths.
For example:
MATCH (n)-[edge_list:Type * {x: 42}]-(m)
This will produce only those paths whose edges have the required `Type` and `x`
property value. Edges that compose the produced paths are stored in a symbol
named `edge_list`. Naturally, the user could have specified any other symbol
Memgraph extends openCypher with a syntax for arbitrary filter expressions
during path matching. The next example filters edges which have property `x`
between `0` and `10`.
MATCH (n)-[edge_list * (edge, node | 0 < edge.x < 10)]-(m)
Here we introduce a lambda function with parentheses, where the first two
arguments, `edge` and `node`, correspond to each edge and node during path
matching. `node` is the destination node we are moving to across the current
`edge`. The last `node` value will be the same value as `m`. Following the
pipe (`|`) character is an arbitrary expression which must produce a boolean
value. If `True`, matching continues, otherwise the path is discarded.
The previous example can be written using the `all` function:
MATCH (n)-[edge_list *]-(m) WHERE all(edge IN edge_list WHERE 0 < edge.x < 10)
However, filtering using a lambda function is more efficient because paths
may be discarded earlier in the traversal. Furthermore, it provides more
flexibility for deciding what kind of paths are matched due to more expressive
filtering capabilities. Therefore, filtering through lambda functions should
be preferred whenever possible.
### Breadth First Search
A typical graph use-case is searching for the shortest path between nodes.
The openCypher standard does not define this feature, so Memgraph provides
a custom implementation, based on the edge expansion syntax.
Finding the shortest path between nodes can be done using breadth-first
MATCH (a {id: 723})-[edge_list:Type *bfs..10]-(b {id: 882}) RETURN *
The above query will find all paths of length up to 10 between nodes `a` and `b`.
The edge type and maximum path length are used in the same way like in variable
length expansion.
To find only the shortest path, simply append `LIMIT 1` to the `RETURN` clause.
MATCH (a {id: 723})-[edge_list:Type *bfs..10]-(b {id: 882}) RETURN * LIMIT 1
Breadth-first expansion allows an arbitrary expression filter that determines
if an expansion is allowed. Following is an example in which expansion is
allowed only over edges whose `x` property is greater than `12` and nodes `y`
whose property is less than `3`:
MATCH (a {id: 723})-[*bfs..10 (e, n | e.x > 12 AND n.y < 3)]-() RETURN *
The filter is defined as a lambda function over `e` and `n`, which denote the edge
and node being expanded over in the breadth first search. Note that if the user
omits the edge list symbol (`edge_list` in previous examples) it will not be included
in the result.
There are a few benefits of the breadth-first expansion approach, as opposed to
a specialized `shortestPath` function. For one, it is possible to inject
expressions that filter on nodes and edges along the path itself, not just the final
destination node. Furthermore, it's possible to find multiple paths to multiple destination
nodes regardless of their length. Also, it is possible to simply go through a node's
neighbourhood in breadth-first manner.
Currently, it isn't possible to get all shortest paths to a single node using
Memgraph's breadth-first expansion.
### Weighted Shortest Path
Another standard use-case in a graph is searching for the weighted shortest
path between nodes. The openCypher standard does not define this feature, so
Memgraph provides a custom implementation, based on the edge expansion syntax.
Finding the weighted shortest path between nodes is done using the weighted
shortest path expansion:
MATCH (a {id: 723})-[
edge_list *wShortest 10 (e, n | e.weight) total_weight
]-(b {id: 882})
The above query will find the shortest path of length up to 10 nodes between
nodes `a` and `b`. The length restriction parameter is optional.
Weighted Shortest Path expansion allows an arbitrary expression that determines
the weight for the current expansion. Total weight of a path is calculated as
the sum of all weights on the path between two nodes. Following is an example in
which the weight between nodes is defined as the product of edge weights
(instead of sum), assuming all weights are greater than '1':
MATCH (a {id: 723})-[
edge_list *wShortest 10 (e, n | log(e.weight)) total_weight
]-(b {id: 882})
RETURN exp(total_weight)
Weighted Shortest Path expansions also allows an arbitrary expression filter
that determines if an expansion is allowed. Following is an example in which
expansion is allowed only over edges whose `x` property is greater than `12`
and nodes `y` whose property is less than `3`:
MATCH (a {id: 723})-[
edge_list *wShortest 10 (e, n | e.weight) total_weight (e, n | e.x > 12 AND n.y < 3)
]-(b {id: 882})
RETURN exp(total_weight)
Both weight and filter expression are defined as lambda functions over `e` and
`n`, which denote the edge and the node being expanded over in the weighted
shortest path search.
@ -1,115 +0,0 @@
## Graph Streams
### Kafka
Memgraphs custom openCypher clause for creating a stream is:
CREATE STREAM stream_name AS
WITH TOPIC 'topic'
[BATCH_INTERVAL milliseconds]
[BATCH_SIZE count]
The `CREATE STREAM` clause happens in a transaction.
`WITH TOPIC` parameter specifies the Kafka topic from which we'll stream
`WITH TRANSFORM` parameter should contain a URI of the transform script.
We cover more about the transform script later, in the [transform](#transform)
`BATCH_INTERVAL` parameter defines the time interval in milliseconds
which is the time between two successive stream importing operations.
`BATCH_SIZE` parameter defines the count of Kafka messages that will be
batched together before import.
If both `BATCH_INTERVAL` and `BATCH_SIZE` parameters are given, the condition
that is satisfied first will trigger the batched import.
Default value for `BATCH_INTERVAL` is 100 milliseconds, and the default value
for `BATCH_SIZE` is 10.
The `DROP` clause deletes a stream:
DROP STREAM stream_name;
The `SHOW` clause enables you to see all configured streams:
You can also start/stop streams with the `START` and `STOP` clauses:
START STREAM stream_name [LIMIT count BATCHES];
STOP STREAM stream_name;
A stream needs to be stopped in order to start it and it needs to be started in
order to stop it. Starting a started or stopping a stopped stream will not
affect that stream.
There are also convenience clauses to start and stop all streams:
Before the actual import, you can also test the stream with the `TEST
STREAM` clause:
TEST STREAM stream_name [LIMIT count BATCHES];
When a stream is tested, data extraction and transformation occurs, but nothing
is inserted into the graph.
A stream needs to be stopped in order to test it. When the batch limit is
omitted, `TEST STREAM` will run for only one batch by default.
#### Transform
The transform script allows Memgraph users to have custom Kafka messages and
still be able to import data in Memgraph by adding the logic to decode the
messages in the transform script.
The entry point of the transform script from Memgraph is the `stream` function.
Input for the `stream` function is a list of bytes that represent byte encoded
Kafka messages, and the output of the `stream` function must be a list of
tuples containing openCypher string queries and corresponding parameters stored
in a dictionary.
To be more precise, the signature of the `stream` function looks like the
stream : [bytes] -> [(str, {str : type})]
type : none | bool | int | float | str | list | dict
An example of a simple transform script that creates vertices if the message
contains one number (the vertex id) or it creates edges if the message contains
two numbers (origin vertex id and destination vertex id) would look like the
def create_vertex(vertex_id):
return ("CREATE (:Node {id: $id})", {"id": vertex_id})
def create_edge(from_id, to_id):
return ("MATCH (n:Node {id: $from_id}), (m:Node {id: $to_id}) "\
"CREATE (n)-[:Edge]->(m)", {"from_id": from_id, "to_id": to_id})
def stream(batch):
result = []
for item in batch:
message = item.decode('utf-8').split()
if len(message) == 1:
elif len(message) == 2:
result.append(create_edge(message[0], message[1]))
return result
@ -1,125 +0,0 @@
## Security
Before reading this article we highly recommend going through a how-to guide
on [managing user privileges](../how_to_guides/
which contains more thorough explanations of the concepts behind `openCypher`
commands listed in this article.
### Users
Creating a user can be done by executing the following command:
CREATE USER user_name [IDENTIFIED BY 'password'];
If the user should authenticate themself on each session, i.e. provide their
password on each session, the part within the brackets is mandatory. Otherwise,
the password is set to `null` and the user will be allowed to log-in using
any password provided that they provide the correct username.
You can also set or alter a user's password anytime by issuing the following
SET PASSWORD FOR user_name TO 'new_password';
Removing a user's password, i.e. allowing the user to log-in using any
password can be done by setting it to `null` as follows:
SET PASSWORD FOR user_name TO null;
### User Roles
Each user can be assigned at most one user role. One can think of user roles
as abstractions which capture the privilege levels of a set of users. For
example, suppose that `Dominik` and `Marko` belong to upper management of
a certain company. It makes sense to grant them a set of privileges that other
users are not entitled to so, instead of granting those privileges to each
of them, we can create a role with those privileges called `manager`
which we assign to `Dominik` and `Marko`.
In other words, Each privilege that is granted to a user role is automatically
granted to a user (unless it has been explicitly denied to that user).
Similarly, each privilege that is denied to a user role is automatically denied
to a user (even if it has been explicitly granted to that user).
Creating a user role can be done by executing the following command:
CREATE ROLE role_name;
Assigning a user role to a certain user can be done by the following command:
SET ROLE FOR user_name TO role_name;
Removing the role from the user can be done by:
CLEAR ROLE FOR user_name;
Finally, showing all users that have a certain role can be done as:
SHOW USERS FOR role_name;
Similarly, querying which role a certain user has can be done as:
SHOW ROLE FOR user_name;
### Privileges
At the moment, privileges are confined to users' abilities to perform certain
`OpenCypher` queries. Namely users can be given permission to execute a subset
of the following commands: `CREATE`, `DELETE`, `MATCH`, `MERGE`, `SET`,
Granting a certain set of privileges to a specific user or user role can be
done by issuing the following command:
GRANT privilege_list TO user_or_role;
For example, granting `AUTH` and `STREAM` privileges to users with the role
`moderator` would be written as:
Similarly, denying privileges is done using the `DENY` keyword instead of
Both denied and granted privileges can be revoked, meaning that their status is
not defined for that user or role. Revoking is done using the `REVOKE` keyword.
The users should note that, although semantically unintuitive, the level of a
certain privilege can be raised by using `REVOKE`. For instance, suppose a user
has been denied a `STREAM` privilege, but the role it belongs to is granted
that privilege. Currently, the user is unable to use data streaming features,
but, after revoking the user's `STREAM` privilege, they will be able to do so.
Finally, if you wish to grant, deny or revoke all privileges and find it tedious
to explicitly list them, you can use the `ALL PRIVILEGES` construct instead.
For example, revoking all privileges from user `jdoe` can be done with the
following command:
Finally, obtaining the status of each privilege for a certain user or role can be
done by issuing the following command:
@ -1,9 +0,0 @@
## Dynamic Graph Partitioner
Memgraph supports dynamic graph partitioning which dynamically improves
performance on badly partitioned dataset over workers. To enable it, the user
should use the following flag when firing up the *master* node:
@ -1,160 +0,0 @@
## Other Features
The following sections describe some of the other supported features.
The `UNWIND` clause is used to unwind a list of values as individual rows.
Example. Produce rows out of a single list.
UNWIND [1,2,3] AS listElement RETURN listElement
More examples are
### Functions
This section contains the list of other supported functions.
Name | Description
`coalesce` | Returns the first non null argument.
`startNode` | Returns the starting node of an edge.
`endNode` | Returns the destination node of an edge.
`degree` | Returns the number of edges (both incoming and outgoing) of a node.
`head` | Returns the first element of a list.
`last` | Returns the last element of a list.
`properties` | Returns the properties of a node or an edge.
`size` | Returns the number of elements in a list or a map. When given a string it returns the number of characters. When given a path it returns the number of expansions (edges) in that path.
`toBoolean` | Converts the argument to a boolean.
`toFloat` | Converts the argument to a floating point number.
`toInteger` | Converts the argument to an integer.
`type` | Returns the type of an edge as a character string.
`keys` | Returns a list keys of properties from an edge or a node. Each key is represented as a string of characters.
`labels` | Returns a list of labels from a node. Each label is represented as a character string.
`nodes` | Returns a list of nodes from a path.
`relationships` | Returns a list of relationships (edges) from a path.
`range` | Constructs a list of value in given range.
`tail` | Returns all elements after the first of a given list.
`abs` | Returns the absolute value of a number.
`ceil` | Returns the smallest integer greater than or equal to given number.
`floor` | Returns the largest integer smaller than or equal to given number.
`round` | Returns the number, rounded to the nearest integer. Tie-breaking is done using the *commercial rounding*, where -1.5 produces -2 and 1.5 produces 2.
`exp` | Calculates `e^n` where `e` is the base of the natural logarithm, and `n` is the given number.
`log` | Calculates the natural logarithm of a given number.
`log10` | Calculates the logarithm (base 10) of a given number.
`sqrt` | Calculates the square root of a given number.
`acos` | Calculates the arccosine of a given number.
`asin` | Calculates the arcsine of a given number.
`atan` | Calculates the arctangent of a given number.
`atan2` | Calculates the arctangent2 of a given number.
`cos` | Calculates the cosine of a given number.
`sin` | Calculates the sine of a given number.
`tan` | Calculates the tangent of a given number.
`sign` | Applies the signum function to a given number and returns the result. The signum of positive numbers is 1, of negative -1 and for 0 returns 0.
`e` | Returns the base of the natural logarithm.
`pi` | Returns the constant *pi*.
`rand` | Returns a random floating point number between 0 (inclusive) and 1 (exclusive).
`startsWith` | Check if the first argument starts with the second.
`endsWith` | Check if the first argument ends with the second.
`contains` | Check if the first argument has an element which is equal to the second argument.
`left` | Returns a string containing the specified number of leftmost characters of the original string.
`lTrim` | Returns the original string with leading whitespace removed.
`replace` | Returns a string in which all occurrences of a specified string in the original string have been replaced by another (specified) string.
`reverse` | Returns a string in which the order of all characters in the original string have been reversed.
`right` | Returns a string containing the specified number of rightmost characters of the original string.
`rTrim` | Returns the original string with trailing whitespace removed.
`split` | Returns a list of strings resulting from the splitting of the original string around matches of the given delimiter.
`substring` | Returns a substring of the original string, beginning with a 0-based index start and length.
`toLower` | Returns the original string in lowercase.
`toString` | Converts an integer, float or boolean value to a string.
`toUpper` | Returns the original string in uppercase.
`trim` | Returns the original string with leading and trailing whitespace removed.
`all` | Check if all elements of a list satisfy a predicate.<br/>The syntax is: `all(variable IN list WHERE predicate)`.<br/> NOTE: Whenever possible, use Memgraph's lambda functions when matching instead.
`single` | Check if only one element of a list satisfies a predicate.<br/>The syntax is: `single(variable IN list WHERE predicate)`.
`reduce` | Accumulate list elements into a single result by applying an expression. The syntax is:<br/>`reduce(accumulator = initial_value, variable IN list | expression)`.
`extract` | A list of values obtained by evaluating an expression for each element in list. The syntax is:<br>`extract(variable IN list | expression)`.
`assert` | Raises an exception reported to the client if the given argument is not `true`.
`counter` | Generates integers that are guaranteed to be unique on the database level, for the given counter name.
`counterSet` | Sets the counter with the given name to the given value.
`indexInfo` | Returns a list of all the indexes available in the database. The list includes indexes that are not yet ready for use (they are concurrently being built by another transaction).
`timestamp` | Returns the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC.
`id` | Returns identifier for a given node or edge. The identifier is generated during the initialization of node or edge and will be persisted through the durability mechanism.
### String Operators
Apart from comparison and concatenation operators openCypher provides special
string operators for easier matching of substrings:
Operator | Description
`a STARTS WITH b` | Returns true if prefix of string a is equal to string b.
`a ENDS WITH b` | Returns true if suffix of string a is equal to string b.
`a CONTAINS b` | Returns true if some substring of string a is equal to string b.
### Parameters
When automating the queries for Memgraph, it comes in handy to change only
some parts of the query. Usually, these parts are values which are used for
filtering results or similar, while the rest of the query remains the same.
Parameters allow reusing the same query, but with different parameter values.
The syntax uses the `$` symbol to designate a parameter name. We don't allow
old Cypher parameter syntax using curly braces. For example, you can parameterize
filtering a node property:
MATCH (node1 {property: $propertyValue}) RETURN node1
You can use parameters instead of any literal in the query, but not instead of
property maps even though that is allowed in standard openCypher. Following
example is illegal in Memgraph:
MATCH (node1 $propertyValue) RETURN node1
To use parameters with Python driver use following syntax:
||||'CREATE (alice:Person {name: $name, age: $ageValue}',
name='Alice', ageValue=22)).consume()
To use parameters which names are integers you will need to wrap parameters in
a dictionary and convert them to strings before running a query:
||||'CREATE (alice:Person {name: $0, age: $1}',
{'0': "Alice", '1': 22})).consume()
To use parameters with some other driver please consult appropriate
### CASE
Conditional expressions can be expressed in openCypher language by simple and
generic form of `CASE` expression. A simple form is used to compare an expression
against multiple predicates. For the first matched predicate result of the
expression provided after the `THEN` keyword is returned. If no expression is
matched value following `ELSE` is returned is provided, or `null` if `ELSE` is not
In generic form, you don't need to provide an expression whose value is compared to
predicates, but you can list multiple predicates and the first one that evaluates
to true is matched:
RETURN CASE WHEN n.height < 30 THEN "short" WHEN n.height > 300 THEN "tall" END
@ -1,63 +0,0 @@
## Differences
Although we try to implement openCypher query language as closely to the
language reference as possible, we had to make some changes to enhance the
user experience.
### Symbolic Names
We don't allow symbolic names (variables, label names...) to be openCypher
keywords (WHERE, MATCH, COUNT, SUM...).
### Unicode Codepoints in String Literal
Use `\u` followed by 4 hex digits in string literal for UTF-16 codepoint and
`\U` with 8 hex digits for UTF-32 codepoint in Memgraph.
### Difference from Neo4j's Cypher Implementation
The openCypher initiative stems from Neo4j's Cypher query language. Following is a list
of most important differences between Neo's Cypher and Memgraph's openCypher implementation,
for users that are already familiar with Neo4j. There might be other differences not documented
here (especially subtle semantic ones).
#### Unsupported Constructs
* Data importing. Memgraph doesn't support Cypher's CSV importing capabilities.
* The `FOREACH` language construct for performing an operation on every list element.
* The `CALL` construct for a standalone function call. This can be expressed using
`RETURN functioncall()`. For example, with Memgraph you can get information about
the indexes present in the database using the `RETURN indexinfo()` openCypher query.
* Stored procedures.
* Regular expressions for string matching.
* `shortestPath` and `allShortestPaths` functions. `shortestPath` can be expressed using
Memgraph's breadth-first expansion syntax already described in this document.
* Patterns in expressions. For example, Memgraph doesn't support `size((n)-->())`. Most of the time
the same functionalities can be expressed differently in Memgraph using `OPTIONAL` expansions,
function calls etc.
* Map projections such as `MATCH (n) RETURN n {.property1, .property2}`.
#### Unsupported Functions
General purpose functions:
* `exists(` - This can be expressed using ` IS NOT NULL`.
* `length()` is named `size()` in Memgraph.
Aggregation functions:
* `count(DISTINCT variable)` - This can be expressed using `WITH DISTINCT variable RETURN count(variable)`.
Mathematical functions:
* `percentileDisc()`
* `stDev()`
* `point()`
* `distance()`
* `degrees()`
List functions:
* `any()`
* `none()`
@ -1,14 +0,0 @@
## Tutorials Overview
Articles within the tutorials section serve as real-world examples of using
Memgraph. These articles tend to provide the user with a reasonably-sized
dataset and some example queries that showcase how to use Memgraph on that
particular dataset. We encourage all Memgraph users to go through at least
one of the tutorials as they can also serve as a verification that Memgraph
is successfully installed on your system.
So far we have covered the following topics:
* [Analyzing TED Talks](
* [Graphing the Premier League](
* [Exploring the European Road Network](
@ -1,176 +0,0 @@
## Analyzing TED Talks
This article is a part of a series intended to show users how to use Memgraph
on real-world data and, by doing so, retrieve some interesting and useful
We highly recommend checking out the other articles from this series:
* [Exploring the European Road Network](
* [Graphing the Premier League](
### Introduction
[TED]( is a nonprofit organization devoted to spreading
ideas, usually in the form of short, powerful talks.
Today, TED talks are influential videos from expert speakers on almost all
topics — from science to business to global issues.
Here we present a small dataset which consists of 97 talks, show how to model
this data as a graph and demonstrate a few example queries.
### Data Model
Each TED talk has a main speaker, so we
identify two types of nodes — `Talk` and `Speaker`. Also, we will add
an edge of type `Gave` pointing to a `Talk` from its main `Speaker`.
Each speaker has a name so we can add property `name` to `Speaker` node.
Likewise, we'll add properties `name`, `title` and `description` to node
`Talk`. Furthermore, each talk is given in a specific TED event, so we can
create node `Event` with property `name` and relationship `InEvent` between
talk and event.
Talks are tagged with keywords to facilitate searching, hence we
add node `Tag` with property `name` and relationship `HasTag` between talk and
tag. Moreover, users give ratings to each talk by selecting up to three
predefined string values. Therefore we add node `Rating` with these values as
property `name` and relationship`HasRating` with property `user_count` between
talk and rating nodes.
### Importing the Snapshot
We have prepared a database snapshot for this example, so the user can easily
import it when starting Memgraph using the `--durability-directory` option.
/usr/lib/memgraph/memgraph --durability-directory /usr/share/memgraph/examples/TEDTalk \
--durability-enabled=false --snapshot-on-exit=false
When using Memgraph installed from DEB or RPM package, the currently running
Memgraph server may need to be stopped before importing the example. The user
can do so using the following command:
systemctl stop memgraph
When using Docker, the example can be imported with the following command:
docker run -p 7687:7687 \
-v mg_lib:/var/lib/memgraph -v mg_log:/var/log/memgraph -v mg_etc:/etc/memgraph \
memgraph --durability-directory /usr/share/memgraph/examples/TEDTalk \
--durability-enabled=false --snapshot-on-exit=false
The user should note that any modifications of the database state will persist
only during this run of Memgraph.
### Example Queries
1) Find all talks given by specific speaker:
MATCH (n:Speaker {name: "Hans Rosling"})-[:Gave]->(m:Talk)
RETURN m.title;
2) Find the top 20 speakers with most talks given:
MATCH (n:Speaker)-[:Gave]->(m)
RETURN, COUNT(m) AS TalksGiven
3) Find talks related by tag to specific talk and count them:
MATCH (n:Talk {name: "Michael Green: Why we should build wooden skyscrapers"})
4) Find 20 most frequently used tags:
MATCH (t:Tag)<-[:HasTag]-(n:Talk)
RETURN AS Tag, COUNT(n) AS TalksCount
ORDER BY TalksCount DESC, Tag LIMIT 20;
5) Find 20 talks most rated as "Funny". If you want to query by other ratings,
possible values are: Obnoxious, Jaw-dropping, OK, Persuasive, Beautiful,
Confusing, Longwinded, Unconvincing, Fascinating, Ingenious, Courageous, Funny,
Informative and Inspiring.
MATCH (r:Rating{name:"Funny"})<-[e:HasRating]-(m:Talk)
RETURN, e.user_count ORDER BY e.user_count DESC LIMIT 20;
6) Find inspiring talks and their speakers from the field of technology:
MATCH (n:Talk)-[:HasTag]->(m:Tag {name: "technology"})
MATCH (n)-[r:HasRating]->(p:Rating {name: "Inspiring"})
MATCH (n)<-[:Gave]-(s:Speaker)
WHERE r.user_count > 1000
RETURN n.title,, r.user_count ORDER BY r.user_count DESC;
7) Now let's see one real-world example — how to make a real-time
recommendation. If you've just watched a talk from a certain
speaker (e.g. Hans Rosling) you might be interested in finding more talks from
the same speaker on a similar topic:
MATCH (n:Speaker {name: "Hans Rosling"})-[:Gave]->(m:Talk)
MATCH (t:Talk {title: "New insights on poverty"})-[:HasTag]->(tag:Tag)<-[:HasTag]-(m)
RETURN m.title as Title, COLLECT(, COUNT(tag) as TagCount
ORDER BY TagCount DESC, Title;
The following few queries are focused on extracting information about
TED events.
8) Find how many talks were given per event:
MATCH (n:Event)<-[:InEvent]-(t:Talk)
RETURN as Event, COUNT(t) AS TalksCount
ORDER BY TalksCount DESC, Event
9) Find the most popular tags in the specific event:
MATCH (n:Event {name:"TED2006"})<-[:InEvent]-(t:Talk)-[:HasTag]->(tag:Tag)
RETURN as Tag, COUNT(t) AS TalksCount
ORDER BY TalksCount DESC, Tag
10) Discover which speakers participated in more than 2 events:
MATCH (n:Speaker)-[:Gave]->(t:Talk)-[:InEvent]->(e:Event)
WITH n, COUNT(e) AS EventsCount WHERE EventsCount > 2
RETURN as Speaker, EventsCount
ORDER BY EventsCount DESC, Speaker;
11) For each speaker search for other speakers that participated in same
MATCH (n:Speaker)-[:Gave]->()-[:InEvent]->(e:Event)<-[:InEvent]-()<-[:Gave]-(m:Speaker)
ORDER BY Speaker;
@ -1,190 +0,0 @@
## Graphing the Premier League
This article is a part of a series intended to show users how to use Memgraph
on real-world data and, by doing so, retrieve some interesting and useful
We highly recommend checking out the other articles from this series:
* [Analyzing TED Talks](
* [Exploring the European Road Network](
### Introduction
is a team sport played between two teams of eleven
players with a spherical ball. The game is played on a rectangular pitch with
a goal at each and. The object of the game is to score by moving the ball
beyond the goal line into the opposing goal. The game is played by more than
250 million players in over 200 countries, making it the world's most
popular sport.
In this article, we will present a graph model of a reasonably sized dataset
of football matches across world's most popular leagues.
### Data Model
In essence, we are trying to model a set of football matches. All information
about a single match is going to be contained in three nodes and two edges.
Two of the nodes will represent the teams that have played the match, while the
third node will represent the game itself. Both edges are directed from the
team nodes to the game node and are labeled as `:Played`.
Let us consider a real life example of this model—Arsene Wenger's 1000th
game in charge of Arsenal. This was a regular fixture of a 2013/2014
English Premier League, yet it was written in the stars that this historic
moment would be a big London derby against Chelsea on Stanford Bridge. The
sketch below shows how this game is being modeled in our database.
+---------------+ +-----------------------------+
|n: Team | |w: Game |
| |-[:Played {side: "home", outcome: "won"}]-->| |
|name: "Chelsea"| |HT_home_score: 4 |
+---------------+ |HT_away_score: 0 |
|HT_result: "H" |
|FT_home_score: 6 |
|FT_away_score: 0 |
|FT_result: "H" |
+---------------+ |date: "2014-03-22" |
|m: Team | |league: "ENG-Premier League" |
| |-[:Played {side: "away", outcome: "lost"}]->|season: 2013 |
|name: "Arsenal"| |referee: "Andre Marriner" |
+---------------+ +-----------------------------+
### Importing the Snapshot
We have prepared a database snapshot for this example, so the user can easily
import it when starting Memgraph using the `--durability-directory` option.
/usr/lib/memgraph/memgraph --durability-directory /usr/share/memgraph/examples/football \
--durability-enabled=false --snapshot-on-exit=false
When using Memgraph installed from DEB or RPM package, the currently running
Memgraph server may need to be stopped before importing the example. The user
can do so using the following command:
systemctl stop memgraph
When using Docker, the example can be imported with the following command:
docker run -p 7687:7687 \
-v mg_lib:/var/lib/memgraph -v mg_log:/var/log/memgraph -v mg_etc:/etc/memgraph \
memgraph --durability-directory /usr/share/memgraph/examples/football \
--durability-enabled=false --snapshot-on-exit=false
The user should note that any modifications of the database state will persist
only during this run of Memgraph.
### Example Queries
1) You might wonder, what leagues are supported?
MATCH (n:Game)
RETURN DISTINCT n.league AS League
ORDER BY League;
2) We have stored a certain number of seasons for each league. What is the
oldest/newest season we have included?
MATCH (n:Game)
RETURN DISTINCT n.league AS League, MIN(n.season) AS Oldest, MAX(n.season) AS Newest
ORDER BY League;
3) You have already seen one game between Chelsea and Arsenal, let's list all of
them in chronological order.
MATCH (n:Team {name: "Chelsea"})-[e:Played]->(w:Game)<-[f:Played]-(m:Team {name: "Arsenal"})
RETURN AS Date, e.side AS Chelsea, f.side AS Arsenal,
w.FT_home_score AS home_score, w.FT_away_score AS away_score
4) How about filtering games in which Chelsea won?
MATCH (n:Team {name: "Chelsea"})-[e:Played {outcome: "won"}]->
(w:Game)<-[f:Played]-(m:Team {name: "Arsenal"})
RETURN AS Date, e.side AS Chelsea, f.side AS Arsenal,
w.FT_home_score AS home_score, w.FT_away_score AS away_score
5) Home field advantage is a thing in football. Let's list the number of home
defeats for each Premier League team in the 2016/2017 season.
MATCH (n:Team)-[:Played {side: "home", outcome: "lost"}]->
(w:Game {league: "ENG-Premier League", season: 2016})
RETURN AS Team, count(w) AS home_defeats
ORDER BY home_defeats, Team;
6) At the end of the season the team with the most points wins the league. For
each victory, a team is awarded 3 points and for each draw it is awarded
1 point. Let's find out how many points did reigning champions (Chelsea) have
at the end of 2016/2017 season.
MATCH (n:Team {name: "Chelsea"})-[:Played {outcome: "drew"}]->(w:Game {season: 2016})
WITH n, COUNT(w) AS draw_points
MATCH (n)-[:Played {outcome: "won"}]->(w:Game {season: 2016})
RETURN draw_points + 3 * COUNT(w) AS total_points;
7) In fact, why not retrieve the whole table?
MATCH (n)-[:Played {outcome: "drew"}]->(w:Game {league: "ENG-Premier League", season: 2016})
WITH n, COUNT(w) AS draw_points
MATCH (n)-[:Played {outcome: "won"}]->(w:Game {league: "ENG-Premier League", season: 2016})
RETURN AS Team, draw_points + 3 * COUNT(w) AS total_points
ORDER BY total_points DESC;
8) People have always debated which of the major leagues is the most exciting.
One basic metric is the average number of goals per game. Let's see the results
at the end of the 2016/2017 season. WARNING: This might shock you.
MATCH (w:Game {season: 2016})
RETURN w.league, AVG(w.FT_home_score) + AVG(w.FT_away_score) AS avg_goals_per_game
ORDER BY avg_goals_per_game DESC;
9) Another metric might be the number of comebacks—games where one side
was winning at half time but were overthrown by the other side by the end
of the match. Let's count such occurrences during all supported seasons across
all supported leagues.
(g.HT_result = "H" AND g.FT_result = "A") OR
(g.HT_result = "A" AND g.FT_result = "H")
RETURN g.league AS League, count(g) AS Comebacks
ORDER BY Comebacks DESC;
10) Exciting leagues also tend to be very unpredictable. On that note, let's
list all triplets of teams where, during the course of one season, team A won
against team B, team B won against team C and team C won against team A.
MATCH (a)-[:Played {outcome: "won"}]->(p:Game {league: "ENG-Premier League", season: 2016})<--
(b)-[:Played {outcome: "won"}]->(q:Game {league: "ENG-Premier League", season: 2016})<--
(c)-[:Played {outcome: "won"}]->(r:Game {league: "ENG-Premier League", season: 2016})<--(a)
RETURN AS Team1, AS Team2, AS Team3;
@ -1,178 +0,0 @@
## Exploring the European Road Network
This article is a part of a series intended to show users how to use Memgraph
on real-world data and, by doing so, retrieve some interesting and useful
We highly recommend checking out the other articles from this series:
* [Analyzing TED Talks](
* [Graphing the Premier League](
### Introduction
This particular article outlines how to use some of Memgraph's built-in graph
algorithms. More specifically, the article shows how to use breadth-first search
graph traversal algorithm, and Dijkstra's algorithm for finding weighted
shortest paths between nodes in the graph.
### Data model
One of the most common applications of graph traversal algorithms is driving
route computation, so we will use European road network graph as an example.
The graph consists of 999 major European cities from 39 countries in total.
Each city is connected to the country it belongs to via an edge of type `:In_`.
There are edges of type `:Road` connecting cities less than 500 kilometers
apart. Distance between cities is specified in the `length` property of the
### Importing the Snapshot
We have prepared a database snapshot for this example, so the user can easily
import it when starting Memgraph using the `--durability-directory` option.
/usr/lib/memgraph/memgraph --durability-directory /usr/share/memgraph/examples/Europe \
--durability-enabled=false --snapshot-on-exit=false
When using Memgraph installed from DEB or RPM package, the currently running
Memgraph server may need to be stopped before importing the example. The user
can do so using the following command:
systemctl stop memgraph
When using Docker, the example can be imported with the following command:
docker run -p 7687:7687 \
-v mg_lib:/var/lib/memgraph -v mg_log:/var/log/memgraph -v mg_etc:/etc/memgraph \
memgraph --durability-directory /usr/share/memgraph/examples/Europe \
--durability-enabled=false --snapshot-on-exit=false
The user should note that any modifications of the database state will persist
only during this run of Memgraph.
### Example Queries
1) Let's list all of the countries in our road network.
2) Which Croatian cities are in our road network?
MATCH (c:City)-[:In_]->(:Country {name: "Croatia"})
3) Which cities in our road network are less than 200 km away from Zagreb?
MATCH (:City {name: "Zagreb"})-[r:Road]->(c:City)
WHERE r.length < 200
Now let's try some queries using Memgraph's graph traversal capabilities.
4) Say you want to drive from Zagreb to Paris. You might wonder, what is the
least number of cities you have to visit if you don't want to drive more than
500 kilometers between stops. Since the edges in our road network don't connect
cities that are more than 500 km apart, this is a great use case for the
breadth-first search (BFS) algorithm.
MATCH p = (:City {name: "Zagreb"})
-[:Road * bfs]->
(:City {name: "Paris"})
RETURN nodes(p);
5) What if we want to bike to Paris instead of driving? It is unreasonable (and
dangerous!) to bike 500 km per day. Let's limit ourselves to biking no more
than 200 km in one go.
MATCH p = (:City {name: "Zagreb"})
-[:Road * bfs (e, v | e.length <= 200)]->
(:City {name: "Paris"})
RETURN nodes(p);
"What is this special syntax?", you might wonder.
`(e, v | e.length <= 200)` is called a *filter lambda*. It's a function that
takes an edge symbol `e` and a vertex symbol `v` and decides whether this edge
and vertex pair should be considered valid in breadth-first expansion by
returning true or false (or Null). In the above example, lambda is returning
true if edge length is not greater than 200, because we don't want to bike more
than 200 km in one go.
6) Let's say we also don't want to visit Vienna on our way to Paris, because we
have a lot of friends there and visiting all of them would take up a lot of our
time. We just have to update our filter lambda.
MATCH p = (:City {name: "Zagreb"})
-[:Road * bfs (e, v | e.length <= 200 AND != "Vienna")]->
(:City {name: "Paris"})
RETURN nodes(p);
As you can see, without the additional restriction we could visit 11 cities. If
we want to avoid Vienna, we must visit at least 12 cities.
7) Instead of counting the cities visited, we might want to find the shortest
paths in terms of distance travelled. This is a textbook application of
Dijkstra's algorithm. The following query will return the list of cities on the
shortest path from Zagreb to Paris along with the total length of the path.
MATCH p = (:City {name: "Zagreb"})
-[:Road * wShortest (e, v | e.length) total_weight]->
(:City {name: "Paris"})
RETURN nodes(p) as cities, total_weight;
As you can see, the syntax is quite similar to breadth-first search syntax.
Instead of a filter lambda, we need to provide a *weight lambda* and the *total
weight symbol*. Given an edge and vertex pair, weight lambda must return the
cost of expanding to the given vertex using the given edge. The path returned
will have the smallest possible sum of costs and it will be stored in the total
weight symbol. A limitation of Dijkstra's algorithm is that the cost must be
8) We can also combine weight and filter lambdas in the shortest-path query.
Let's say we're interested in the shortest path that doesn't require travelling
more that 200 km in one go for our bike route.
MATCH p = (:City {name: "Zagreb"})
-[:Road * wShortest (e, v | e.length) total_weight (e, v | e.length <= 200)]->
(:City {name: "Paris"})
RETURN nodes(p) as cities, total_weight;
9) Let's try and find 10 cities that are furthest away from Zagreb.
MATCH (:City {name: "Zagreb"})
-[:Road * wShortest (e, v | e.length) total_weight]->
RETURN c, total_weight
ORDER BY total_weight DESC LIMIT 10;
It is not surprising to see that they are all in Siberia.
To learn more about these algorithms, we suggest you check out their Wikipedia
* [Breadth-first search](
* [Dijkstra's algorithm](
@ -1,61 +0,0 @@
## Upcoming Features
This chapter describes some of the planned features, that we at Memgraph are
working on.
### Performance Improvements
Excellent database performance is one of Memgraph's long-standing goals. We
will be continually working on improving the performance. This includes:
* query compilation;
* query execution;
* core engine performance;
* algorithmic improvements (i.e. bidirectional breadth-first search);
* memory usage and
* other improvements.
### Label-Property Index Usage Improvements
Currently, indexing combinations of labels and properties can be created, but
cannot be deleted. We plan to add a new query language construct which will
allow deletion of created indices.
### Improving openCypher Support
Although we have implemented the most common features of the openCypher query
language, there are other useful features we are still working on.
#### Functions
Memgraph's openCypher implementation supports the most useful functions, but
there are more which openCypher provides. Some are related to not yet
implemented features like paths, while some may use the features Memgraph
already supports. Out of the remaining functions, some are more useful than
others and as such they will be supported sooner.
#### List Comprehensions
List comprehensions are similar to the supported `collect` function, which
generates a list out of multiple values. But unlike `collect`, list
comprehensions offer a powerful mechanism for filtering or otherwise
manipulating values which are collected into a list.
For example, getting numbers between 0 and 10 and squaring them:
RETURN [x IN range(0, 10) | x^2] AS squares
Another example, to collect `:Person` nodes with `age` less than 42, without
list comprehensions can be achieved with:
MATCH (n :Person) WHERE n.age < 42 RETURN collect(n)
Using list comprehensions, the same can be done with the query:
MATCH (n :Person) RETURN [n IN collect(n) WHERE n.age < 42]
Reference in New Issue
Block a user