memgraph/open-cypher.md at 514d9d2808f04f610ed65bac104d371db987163c

Teon Banek 514d9d2808 Add section on indexing in user documentation

Summary:
The openCypher chapter of user documentation now includes information on
creating label & property index.

Reviewers: florijan, buda

Reviewed By: florijan

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D537

2017-07-12 11:31:50 +02:00

15 KiB

Raw Blame History

openCypher Query Language

openCypher is a query language for querying graph databases. It aims to be intuitive and easy to learn, while providing a powerful interface for working with graph based data.

Memgraph supports most of the commonly used constructs of the language. This chapter contains the details of features which are implemented. Additionally, not yet supported features of the language are listed.

Reading existing Data
Writing new Data
Reading & Writing
Indexing
Other Features

Reading existing Data

The simplest usage of the language is to find data stored in the database. For that purpose, the following clauses are offered:

MATCH, which searches for patterns;
WHERE, for filtering the matched data and
RETURN, for defining what will be presented to the user in the result set.

MATCH

This clause is used to obtain data from Memgraph by matching it to a given pattern. For example, to find each node in the database, you can use the following query.

MATCH (node) RETURN node

Finding connected nodes can be achieved by using the query:

MATCH (node1) -[connection]- (node2) RETURN node1, connection, node2

In addition to general pattern matching, you can narrow the search down by specifying node labels and properties. Similarly, edge types and properties can also be specified. For example, finding each node labeled as Person and with property age being 42, is done with the following query.

MATCH (n :Person {age: 42}) RETURN n.

While their friends can be found with the following.

MATCH (n :Person {age: 42}) -[:FriendOf]- (friend) RETURN friend.

More details on how MATCH works can be found here. Note that variable length paths and named paths are not yet supported.

The MATCH clause can be modified by prepending the OPTIONAL keyword. OPTIONAL MATCH clause behaves the same as a regular MATCH, but when it fails to find the pattern, missing parts of the pattern will be filled with null values. Examples can be found here.

WHERE

You have already seen that simple filtering can be achieved by using labels and properties in MATCH patterns. When more complex filtering is desired, you can use WHERE paired with MATCH or OPTIONAL MATCH. For example, finding each person older than 20 is done with the this query.

MATCH (n :Person) WHERE n.age > 20 RETURN n

Additional examples can be found here.

RETURN

The RETURN clause defines which data should be included in the resulting set. Basic usage was already shown in the examples for MATCH and WHERE clauses. Another feature of RETURN is renaming the results using the AS keyword.

Example.

MATCH (n :Person) RETURN n AS people

That query would display all nodes under the header named people instead of n.

When you want to get everything that was matched, you can use the * (asterisk) symbol.

This query:

MATCH (node1) -[connection]- (node2) RETURN *

is equivalent to:

MATCH (node1) -[connection]- (node2) RETURN node1, connection, node2

RETURN can be followed by the DISTINCT operator, which will remove duplicate results. For example, getting unique names of people can be achieved with:

MATCH (n :Person) RETURN DISTINCT n.name

Besides choosing what will be the result and how it will be named, the RETURN clause can also be used to:

limit results with LIMIT sub-clause;
skip results with SKIP sub-clause;
order results with ORDER BY sub-clause and
perform aggregations (such as count).

More details on RETURN can be found here.

SKIP & LIMIT

These sub-clauses take a number of how many results to skip or limit. For example, to get the first 3 results you can use this query.

MATCH (n :Person) RETURN n LIMIT 3

If you want to get all the results after the first 3, you can use the following.

MATCH (n :Person) RETURN n SKIP 3

The SKIP and LIMIT can be combined. So for example, to get the 2nd result, you can do:

MATCH (n :Person) RETURN n SKIP 1 LIMIT 1

ORDER BY

Since the patterns which are matched can come in any order, it is very useful to be able to enforce some ordering among the results. In such cases, you can use the ORDER BY sub-clause.

For example, the following query will get all :Person nodes and order them by their names.

MATCH (n :Person) RETURN n ORDER BY n.name

By default, ordering will be in the ascending order. To change the order to be descending, you should append DESC.

For example, to order people by their name descending, you can use this query.

MATCH (n :Person) RETURN n ORDER BY n.name DESC

You can also order by multiple variables. The results will be sorted by the first variable listed. If the values are equal, the results are sorted by the second variable, and so on.

Example. Ordering by first name descending and last name ascending.

MATCH (n :Person) RETURN n ORDER BY n.name DESC, n.lastName

Note that ORDER BY sees only the variable names as carried over by RETURN. This means that the following will result in an error.

MATCH (n :Person) RETURN old AS new ORDER BY old.name

Instead, the new variable must be used:

MATCH (n: Person) RETURN old AS new ORDER BY new.name

The ORDER BY sub-clause may come in handy with SKIP and/or LIMIT sub-clauses. For example, to get the oldest person you can use the following.

MATCH (n :Person) RETURN n ORDER BY n.age DESC LIMIT 1

Aggregating

openCypher has functions for aggregating data. Memgraph currently supports the following aggregating functions.

avg, for calculating the average.
collect, for collecting multiple values into a single list.
count, for counting the resulting values.
max, for calculating the maximum result.
min, for calculating the minimum result.
sum, for getting the sum of numeric results.

Example, calculating the average age.

MATCH (n :Person) RETURN avg(n.age) AS averageAge

Click here for additional details on how aggregations work.

Writing new Data

For adding new data, you can use the following clauses.

CREATE, for creating new nodes and edges.
SET, for adding new or updating existing labels and properties.
DELETE, for deleting nodes and edges.
REMOVE, for removing labels and properties.

You can still use the RETURN clause to produce results after writing, but it is not mandatory.

CREATE

This clause is used to add new nodes and edges to the database. The creation is done by providing a pattern, similarly to MATCH clause.

For example, to create 2 new nodes connected with a new edge, use this query.

CREATE (node1) -[:edge_type]-> (node2)

Additional information on CREATE is here.

SET

The SET clause is used to update labels and properties of already existing data.

Example. Incrementing everyone's age by 1.

MATCH (n :Person) SET n.age = n.age + 1

Click here for a more detailed explanation on what can be done with SET.

DELETE

This clause is used to delete nodes and edges from the database.

Example. Removing all edges of a single type.

MATCH () -[edge :type]- () DELETE edge

When testing the database, you want to often have a clean start by deleting every node and edge in the database. It is reasonable that deleting each node should delete all edges coming into or out of that node.

MATCH (node) DELETE node

But, openCypher prevents accidental deletion of edges. Therefore, the above query will report an error. Instead, you need to use the DETACH keyword, which will remove edges from a node you are deleting. The following should work and delete everything in the database.

MATCH (node) DETACH DELETE node

More examples are here.

REMOVE

The REMOVE clause is used to remove labels and properties from nodes and edges.

Example.

MATCH (n :WrongLabel) REMOVE n :WrongLabel, n.property

Reading & Writing

OpenCypher supports combining multiple reads and writes using the WITH clause. In addition to combining, the MERGE clause is provided which may create patterns if they do not exist.

WITH

The write part of the query cannot be simply followed by another read part. In order to combine them, WITH clause must be used. The names this clause establishes are transferred from one part to another.

For example, creating a node and finding all nodes with the same property.

CREATE (node {property: 42}) WITH node.property AS propValue
MATCH (n {property: propValue}) RETURN n

Note that the node is not visible after WITH, since only node.property was carried over.

This clause behaves very much like RETURN, so you should refer to features of RETURN.

MERGE

The MERGE clause is used to ensure that a pattern you are looking for exists in the database. This means that if the pattern is not found, it will be created. In a way, this clause is like a combination of MATCH and CREATE.

Example. Ensure that a person has at least one friend.

MATCH (n :Person) MERGE (n) -[:FriendOf]-> (m)

The clause also provides additional features for updating the values depending on whether the pattern was created or matched. This is achieved with ON CREATE and ON MATCH sub clauses.

Example. Set a different properties depending on what MERGE did.

MATCH (n :Person) MERGE (n) -[:FriendOf]-> (m)
ON CREATE SET m.prop = "created" ON MATCH SET m.prop = "existed"

For more details, click this link.

Indexing

An index stores additional information on certain types of data, so that retrieving said data becomes more efficient. Downsides of indexing are:

requiring extra storage for each index and
slowing down writes to the database.

Carefully choosing which data to index can tremendously improve data retrieval efficiency, and thus make index downsides negligible.

Memgraph automatically indexes labeled data. This improves queries which fetch nodes by label:

MATCH (n :Label) ... RETURN n

Indexing can also be applied to data with a specific combination of label and property. These are not automatically created, instead a user needs to create them explicitly. Creation is done using a special CREATE INDEX ON :Label(property) language construct.

For example, to index nodes which is labeled as :Person and has a property named age:

CREATE INDEX ON :Person(age)

After the index is created, retrieving those nodes will become more efficient. For example, the following query will retrieve all nodes which have an age property, instead of fetching each :Person node and checking whether the property exists.

MATCH (n :Person {age: 42}) RETURN n

Using index based retrieval also works when filtering labels and properties with WHERE. For example, the same effect as in the previous example can be done with:

MATCH (n) WHERE n:Person AND n.age = 42 RETURN n

Since the filter inside WHERE can contain any kind of an expression, the expression can be complicated enough so that the index does not get used. We are continuously improving the recognition of index usage opportunities from a WHERE expression. If there is any suspicion that an index may not be used, we recommend putting properties and labels inside the MATCH pattern.

Currently, once an index is created it cannot be deleted. This feature will be implemented very soon. The expected syntax for removing an index will be DROP INDEX ON :Label(property).

Other Features

The following sections describe some of the other supported features.

UNWIND

The UNWIND clause is used to unwind a list of values as individual rows.

Example. Produce rows out of a single list.

UNWIND [1,2,3] AS listElement RETURN listElement