memgraph/docs/user_technical/reference_guide/reading_existing_data.md
Ivan Paljak 6615a9de53 Restructure user-technical
Reviewers: buda

Reviewed By: buda

Subscribers: pullbot, mferencevic

Differential Revision: https://phabricator.memgraph.io/D1547
2018-08-23 11:13:03 +02:00

8.5 KiB

Reading Existing Data

The simplest usage of the language is to find data stored in the database. For that purpose, the following clauses are offered:

  • MATCH, which searches for patterns;
  • WHERE, for filtering the matched data and
  • RETURN, for defining what will be presented to the user in the result set.
  • UNION and UNION ALL for combining results from multiple queries.

MATCH

This clause is used to obtain data from Memgraph by matching it to a given pattern. For example, to find each node in the database, you can use the following query.

MATCH (node) RETURN node

Finding connected nodes can be achieved by using the query:

MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2

In addition to general pattern matching, you can narrow the search down by specifying node labels and properties. Similarly, edge types and properties can also be specified. For example, finding each node labeled as Person and with property age being 42, is done with the following query.

MATCH (n :Person {age: 42}) RETURN n

While their friends can be found with the following.

MATCH (n :Person {age: 42})-[:FriendOf]-(friend) RETURN friend

There are cases when a user needs to find data which is connected by traversing a path of connections, but the user doesn't know how many connections need to be traversed. openCypher allows for designating patterns with variable path lengths. Matching such a path is achieved by using the * (asterisk) symbol inside the edge element of a pattern. For example, traversing from node1 to node2 by following any number of connections in a single direction can be achieved with:

MATCH (node1)-[r*]->(node2) RETURN node1, r, node2

If paths are very long, finding them could take a long time. To prevent that, a user can provide the minimum and maximum length of the path. For example, paths of length between 2 and 4 can be obtained with a query like:

MATCH (node1)-[r*2..4]->(node2) RETURN node1, r, node2

It is possible to name patterns in the query and return the resulting paths. This is especially useful when matching variable length paths:

MATCH path = ()-[r*2..4]->() RETURN path

More details on how MATCH works can be found here.

The MATCH clause can be modified by prepending the OPTIONAL keyword. OPTIONAL MATCH clause behaves the same as a regular MATCH, but when it fails to find the pattern, missing parts of the pattern will be filled with null values. Examples can be found here.

WHERE

You have already seen that simple filtering can be achieved by using labels and properties in MATCH patterns. When more complex filtering is desired, you can use WHERE paired with MATCH or OPTIONAL MATCH. For example, finding each person older than 20 is done with the this query.

MATCH (n :Person) WHERE n.age > 20 RETURN n

Additional examples can be found here.

RETURN

The RETURN clause defines which data should be included in the resulting set. Basic usage was already shown in the examples for MATCH and WHERE clauses. Another feature of RETURN is renaming the results using the AS keyword.

Example.

MATCH (n :Person) RETURN n AS people

That query would display all nodes under the header named people instead of n.

When you want to get everything that was matched, you can use the * (asterisk) symbol.

This query:

MATCH (node1)-[connection]-(node2) RETURN *

is equivalent to:

MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2

RETURN can be followed by the DISTINCT operator, which will remove duplicate results. For example, getting unique names of people can be achieved with:

MATCH (n :Person) RETURN DISTINCT n.name

Besides choosing what will be the result and how it will be named, the RETURN clause can also be used to:

  • limit results with LIMIT sub-clause;
  • skip results with SKIP sub-clause;
  • order results with ORDER BY sub-clause and
  • perform aggregations (such as count).

More details on RETURN can be found here.

SKIP & LIMIT

These sub-clauses take a number of how many results to skip or limit. For example, to get the first 3 results you can use this query.

MATCH (n :Person) RETURN n LIMIT 3

If you want to get all the results after the first 3, you can use the following.

MATCH (n :Person) RETURN n SKIP 3

The SKIP and LIMIT can be combined. So for example, to get the 2nd result, you can do:

MATCH (n :Person) RETURN n SKIP 1 LIMIT 1

ORDER BY

Since the patterns which are matched can come in any order, it is very useful to be able to enforce some ordering among the results. In such cases, you can use the ORDER BY sub-clause.

For example, the following query will get all :Person nodes and order them by their names.

MATCH (n :Person) RETURN n ORDER BY n.name

By default, ordering will be in the ascending order. To change the order to be descending, you should append DESC.

For example, to order people by their name descending, you can use this query.

MATCH (n :Person) RETURN n ORDER BY n.name DESC

You can also order by multiple variables. The results will be sorted by the first variable listed. If the values are equal, the results are sorted by the second variable, and so on.

Example. Ordering by first name descending and last name ascending.

MATCH (n :Person) RETURN n ORDER BY n.name DESC, n.lastName

Note that ORDER BY sees only the variable names as carried over by RETURN. This means that the following will result in an error.

MATCH (n :Person) RETURN old AS new ORDER BY old.name

Instead, the new variable must be used:

MATCH (n: Person) RETURN old AS new ORDER BY new.name

The ORDER BY sub-clause may come in handy with SKIP and/or LIMIT sub-clauses. For example, to get the oldest person you can use the following.

MATCH (n :Person) RETURN n ORDER BY n.age DESC LIMIT 1
Aggregating

openCypher has functions for aggregating data. Memgraph currently supports the following aggregating functions.

  • avg, for calculating the average.
  • collect, for collecting multiple values into a single list or map. If given a single expression values are collected into a list. If given two expressions, values are collected into a map where the first expression denotes map keys (must be string values) and the second expression denotes map values.
  • count, for counting the resulting values.
  • max, for calculating the maximum result.
  • min, for calculating the minimum result.
  • sum, for getting the sum of numeric results.

Example, calculating the average age:

MATCH (n :Person) RETURN avg(n.age) AS averageAge

Collecting items into a list:

MATCH (n :Person) RETURN collect(n.name) AS list_of_names

Collecting items into a map:

MATCH (n :Person) RETURN collect(n.name, n.age) AS map_name_to_age

Click here for additional details on how aggregations work.

UNION and UNION ALL

openCypher supports combining results from multiple queries into a single result set. That result will contain rows that belong to queries in the union respecting the union type.

Using UNION will contain only distinct rows while UNION ALL will keep all rows from all given queries.

Restrictions when using UNION or UNION ALL:

  • The number and the names of columns returned by queries must be the same for all of them.
  • There can be only one union type between single queries, i.e. a query can't contain both UNION and UNION ALL.

Example, get distinct names that are shared between persons and movies:

MATCH(n: Person) RETURN n.name AS name UNION MATCH(n: Movie) RETURN n.name AS name

Example, get all names that are shared between persons and movies (including duplicates):

MATCH(n: Person) RETURN n.name AS name UNION ALL MATCH(n: Movie) RETURN n.name AS name