Teon Banek 6f10b1c115 Move query implementation from Phriction to this repo

Summary:
Our query parsing, planning and execution architecture was described on
Phabricator wiki pages, Phriction. This commit copies the said
documentation here, so that it's easier to access for all developers.
Additional benefit is tracking the changes and hopefully suggesting to
developers to keep it up to date.

Besides making a copy, the documentation has been updated to reflect the
current state of the codebase. Note that some things are still missing,
but what was written should now be correct.

Reviewers: mtomic, llugovic

Reviewed By: mtomic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1854

2019-02-15 16:58:39 +01:00

18 KiB

Raw Blame History

Logical Plan Execution

We implement classical iterator style operators. Logical operators define operations on database. They encapsulate the following info: what the input is (another LogicalOperator), what to do with the data, and how to do it.

Currently logical operators can have zero or more input operations, and thus a LogicalOperator tree is formed. Most LogicalOperator types have only one input, so we are mostly working with chains instead of full fledged trees. You can find information on each operator in src/query/plan/operator.lcp.

Cursor

Logical operators do not perform database work themselves. Instead they create Cursor objects that do the actual work, based on the info in the operator. Cursors expose a Pull method that gets called by the cursor's consumer. The consumer keeps pulling as long as the Pull returns true (indicating it successfully performed some work and might be eligible for another Pull). Most cursors will call the Pull function of their input provided cursor, so typically a cursor chain is created that is analogue to the logical operator chain it's created from.

Frame

The Frame object contains all the data of the current Pull chain. It serves for communicating data between cursors.

For example, in a MATCH (n) RETURN n query the ScanAllCursor places a vertex on the Frame for each Pull. It places it on the place reserved for the n symbol. Then the ProduceCursor can take that same value from the Frame because it knows the appropriate symbol. Frame positions are indexed by Symbol objects.

ExpressionEvaluator

Expressions results are not placed on the Frame since they do not need to be communicated between different Cursors. Instead, expressions are evaluated using an instance of ExpressionEvaluator. Since generally speaking an expression can be defined by a tree of subexpressions, the ExpressionEvaluator is implemented as a tree visitor. There is a performance sub-optimality here because a stack is used to communicate intermediary expression results between elements of the tree. This is one of the reasons why it's planned to use Frame for intermediary expression results as well. The other reason is that it might facilitate compilation later on.

Cypher Execution Semantics

Cypher query execution has mostly well-defined semantics. Some are explicitly defined by openCypher and its TCK, while others are implicitly defined by Neo4j's implementation of Cypher that we want to be generally compatible with.

These semantics can in short be described as follows: a Cypher query consists of multiple clauses some of which modify it. Generally, every clause in the query, when reading it left to right, operates on a consistent state of the property graph, untouched by subsequent clauses. This means that a MATCH clause in the beginning operates on a graph-state in which modifications by the subsequent SET are not visible.

The stated semantics feel very natural to the end-user, and Neo seems to implement them well. For Memgraph the situation is complex because LogicalOperator execution (through a Cursor) happens one Pull at a time (generally meaning all the query clauses get executed for every top-level Pull). This is not inherently consistent with Cypher semantics because a SET clause can modify data, and the MATCH clause that precedes it might see the modification in a subsequent Pull. Also, the RETURN clause might want to stream results to the user before all SET clauses have been executed, so the user might see some intermediate graph state. There are many edge-cases that Memgraph does its best to avoid to stay true to Cypher semantics, while at the same time using a high-performance streaming approach. The edge-cases are enumerated in this document along with the implementation details they imply.

Implementation Peculiarities

Once

An operator that does nothing but whose Cursor::Pull returns true on the first Pull and false on subsequent ones. This operator is used when another operator has an optional input, because in Cypher a clause will typically execute once for every input from the preceding clauses, or just once if there was no preceding input. For example, consider the CREATE clause. In the query CREATE (n) only one node is created, while in the query MATCH (n) CREATE (m) a node is created for each existing node. Thus in our CreateNode logical operator the input is either a ScanAll operator, or a Once operator.

GraphView

In the previous section, Cypher Execution Semantics, we mentioned how the preceding clauses should not see changes made in subsequent ones. For that reason, some operators take a GraphView enum value. This value determines which state of the graph an operator sees.

Consider the query MATCH (n)--(m) WHERE n.x = 0 SET m.x = 1. Naive streaming could match a vertex n on the given criteria, expand to m, update it's property, and in the next iteration consider the vertex previously matched to m and skip it because it's newly set property value does not qualify. This is not how Cypher works. To handle this issue properly, Memgraph designed the VertexAccessor class that tracks two versions of data: one that was visible before the current transaction+command, and the optional other that was created in the current transaction+command. The MATCH clause will be planned as ScanAll and Expand operations using GraphView::OLD value. This will ensure modifications performed in the same query do not affect it. The same applies to edges and the EdgeAccessor class.

Existing Record Detection

It's possible that a pattern element has already been declared in the same pattern, or a preceding pattern. For example MATCH (n)--(m), (n)--(l) or a cycle-detection match MATCH (n)-->(n) RETURN n. Implementation-wise, existing record detection just checks that the expanded record is equal to the one already on the frame.

Why Not Use Separate Expansion Ops for Edges and Vertices?

Expanding an edge and a vertex in separate ops is not feasible when matching a cycle in bi-directional expansions. Consider the query MATCH (n)--(n) RETURN n. Let's try to expand first the edge in one op, and vertex in the next. The vertex expansion consumes the edge expansion input. It takes the expanded edge from the frame. It needs to detect a cycle by comparing the vertex existing on the frame with one of the edge vertices (from or to). But which one? It doesn't know, and can't ensure correct cycle detection.

Data Visibility During and After SET

In Cypher, setting values always works on the latest version of data (from preceding or current clause). That means that within a SET clause all the changes from previous clauses must be visible, as well as changes done by the current SET clause. Also, if there is a clause after SET it must see all the changes performed by the preceding SET. Both these things are best illustrated with the following queries executed on an empty database:

CREATE (n:A {x:0})-[:EdgeType]->(m:B {x:0})
MATCH (n)--(m) SET m.x = n.x + 1 RETURN labels(n), n.x, labels(m), m.x

This returns:

+---------+---+---------+---+ |labels(n)|n.x|labels(m)|m.x| +:=======:+:=:+:=======:+:=:+ |[A] |2 |[B] |1 | +---------+---+---------+---+ |[B] |1 |[A] |2 | +---------+---+---------+---+

The obtained result implies the following operations:

In the first iteration set the value of the B.x to 1
In the second iteration the we observe B.x with the value of 1 and set A.x to 2
In RETURN we see all the changes made in both iterations

To implement the desired behavior Memgraph utilizes two techniques. First is the already mentioned tracking of two versions of data in vertex accessors. Using this approach ensures that the second iteration in the example query sees the data modification performed by the preceding iteration. The second technique is the Accumulate operation that accumulates all the iterations from the preceding logical op before passing them to the next logical op. In the example query, Accumulate ensures that the results returned to the user reflect changes performed in all iterations of the query (naive streaming could stream results at the end of first iteration producing inconsistent results). Note that Accumulate is demanding regarding memory and slows down query execution. For that reason it should be used only when necessary, for example it does not have to be used in a query that has MATCH and SET but no RETURN.

Neo4j Inconsistency on Multiple SET Clauses

Considering the preceding example it could be expected that when a query has multiple SET clauses all the changes from those preceding one are visible. This is not the case in Neo4j's implementation. Consider the following queries executed on an empty database:

CREATE (n:A {x:0})-[:EdgeType]->(m:B {x:0})
MATCH (n)--(m) SET n.x = n.x + 1 SET m.x = m.x * 2
RETURN labels(n), n.x, labels(m), m.x

This returns:

+---------+---+---------+---+ |labels(n)|n.x|labels(m)|m.x| +:=======:+:=:+:=======:+:=:+ |[A] |2 |[B] |1 | +---------+---+---------+---+ |[B] |1 |[A] |2 | +---------+---+---------+---+

If all the iterations of the first SET clause were executed before executing the second, all the resulting values would be 2. This not being the case, we conclude that Neo4j does not use a barrier-like mechanism between SET clauses. It is Memgraph's current vision that this is inconsistent and we plan to reduce Neo4j compliance in favour of operation consistency.

Double Deletion

It's possible to match the same graph element multiple times in a single query and delete it. Neo supports this, and so do we. The relevant implementation detail is in the GraphDbAccessor class, where the record deletion functions reside, and not in the logical plan execution. It comes down to checking if a record has already been deleted in the current transaction+command and not attempting to do it again (results in a crash).

Set + Delete Edge-case

It's legal for a query to combine SET and DELETE clauses. Consider the following queries executed on an empty database:

CREATE ()-[:T]->()
MATCH (n)--(m) SET n.x = 42 DETACH DELETE m

Due to the MATCH being undirected the second pull will attempt to set data on a deleted vertex. This is not a legal operation in Memgraph storage implementation. For that reason the logical operator for SET must check if the record it's trying to set something on has been deleted by the current transaction+command. If so, the modification is not executed.

Deletion Accumulation

Sometimes it's necessary to accumulate deletions of all the matches before attempting to execute them. Consider this the following. Start with an empty database and execute queries:

CREATE ()-[:T]->()-[:T]->()
MATCH (a)-[r1]-(b)-[r2]-(c) DELETE r1, b, c

Note that the DELETE clause attempts to delete node c, but it does not detach it by deleting edge r2. However, due to undirected edge in the MATCH, both edges get pulled and deleted.

Currently Memgraph does not support this behavior, Neo does. There are a few ways that we could do this.

Accumulate on deletion (that sucks because we have to keep track of everything that gets returned after the deletion).
Maybe we could stream through the deletion op, but defer actual deletion until plan-execution end.
Ignore this because it's very edgy (this is the currently selected option).

Aggregation Without Input

It is necessary to define what aggregation ops return when they receive no input. Following is a table that shows what Neo4j's Cypher implementation and SQL produce.

+-------------+------------------------+---------------------+---------------------+------------------+ | <OP> | 1. Cypher, no group-by | 2. Cypher, group-by | 3. SQL, no group-by | 4. SQL, group-by | +=============+:======================:+:===================:+:===================:+:================:+ | Count(*) | 0 | <NO_ROWS> | 0 | <NO_ROWS> | +-------------+------------------------+---------------------+---------------------+------------------+ | Count(prop) | 0 | <NO_ROWS> | 0 | <NO_ROWS> | +-------------+------------------------+---------------------+---------------------+------------------+ | Sum | 0 | <NO_ROWS> | NULL | <NO_ROWS> | +-------------+------------------------+---------------------+---------------------+------------------+ | Avg | NULL | <NO_ROWS> | NULL | <NO_ROWS> | +-------------+------------------------+---------------------+---------------------+------------------+ | Min | NULL | <NO_ROWS> | NULL | <NO_ROWS> | +-------------+------------------------+---------------------+---------------------+------------------+ | Max | NULL | <NO_ROWS> | NULL | <NO_ROWS> | +-------------+------------------------+---------------------+---------------------+------------------+ | Collect | [] | <NO_ROWS> | N/A | N/A | +-------------+------------------------+---------------------+---------------------+------------------+

Where:

1. `MATCH (n) RETURN <OP>(n.prop)`
2. `MATCH (n) RETURN <OP>(n.prop), (n.prop2)`
3. `SELECT <OP>(prop) FROM Table`
4. `SELECT <OP>(prop), prop2 FROM Table GROUP BY prop2`

Neo's Cypher implementation diverges from SQL only when performing SUM. Memgraph implements SQL-like behavior. It is considered that SUM of arbitrary elements should not be implicitly 0, especially in a property graph without a strict schema (the property in question can contain values of arbitrary types, or no values at all).

OrderBy

The OrderBy logical operator sorts the results in the desired order. It occurs in Cypher as part of a WITH or RETURN clause. Both the concept and the implementation are straightforward. It's necessary for the logical op to Pull everything from its input so it can be sorted. It's not necessary to keep the whole Frame state of each input, it is sufficient to keep a list of TypedValues on which the results will be sorted, and another list of values that need to be remembered and recreated on the Frame when yielding.

The sorting itself is made to reflect that of Neo's implementation which comes down to these points.

Null comes last (as if it's greater than anything).
Primitive types compare naturally, with no implicit casting except from int to double.
Complex types are not comparable.
Every unsupported comparison results in an exception that gets propagated to the end user.

Limit in Write Queries

Limit can be used as part of a write query, in which case it will not reduce the amount of performed updates. For example, consider a database that has 10 vertices. The query MATCH (n) SET n.x = 1 RETURN n LIMIT 3 will result in all vertices having their property value changed, while returning only the first to the client. This makes sense from the implementation standpoint, because Accumulate is planned after SetProperty but before Produce and Limit operations. Note that this behavior can be non-deterministic in some queries, since it relies on the order of iteration over nodes which is undefined when not explicitly specified.

Merge

MERGE in Cypher attempts to match a pattern. If it already exists, it does nothing and subsequent clauses like RETURN can use the matched pattern elements. If the pattern can't match to any data, it creates it. For detailed information see Neo4j's merge documentation.

An important thing about MERGE is visibility of modified data. MERGE takes an input (typically a MATCH) and has two additional phases: the merging part, and the subsequent set parts (ON MATCH SET and ON CREATE SET). Analysis of Neo4j's behavior indicates that each of these three phases (input, merge, set) does not see changes to the graph state done by subsequent phase. The input phase does not see data created by the merge phase, nor the set phase. This is consistent with what seems like the general Cypher philosophy that query clause effects aren't visible in the preceding clauses.

We define the Merge logical operator as a routing operator that uses three logical operator branches.

The input from a preceding clause.

For example in MATCH (n), (m) MERGE (n)-[:T]-(m). This input is optional because MERGE is allowed to be the first clause in a query.
The merge_match branch.

This logical operator branch is Pull-ed from until exhausted for each successful Pull from the input branch.
The merge_create branch.

This branch is Pulled when the merge_match branch does not match anything (no successful Pulls) for an input Pull. It is Pulled only once in such a situation, since only one creation needs to occur for a failed match.

The ON MATCH SET and ON CREATE SET parts of the MERGE clause are included in the merge_match and merge_create branches respectively. They are placed on the end of their branches so that they execute only when those branches succeed.

Memgraph strives to be consistent with Neo in its MERGE implementation, while at the same time keeping performance as good as possible. Consistency with Neo w.r.t. graph state visibility is not trivial. Documentation for Expand and Set describe how Memgraph keeps track of both the updated version of an edge/vertex and the old one, as it was before the current transaction+command. This technique is also used in Merge. The input phase/branch of Merge always looks at the old data. The merge phase needs to see the new data so it doesn't create more data then necessary.

For example, consider the query.

MATCH (p:Person) MERGE (c:City {name: p.lives_in})

This query needs to create a city node only once for each unique p.lives_in. Finally the set phase of a MERGE clause should not affect the merge phase. To achieve this the merge_match branch of the Merge operator should see the latest created nodes, but filter them on their old state (if those nodes were not created by the create_branch). Implementation-wise that means that ScanAll and Expand operators in the merge_branch need to look at the new graph state, while Filter operators the old, if available.

18 KiB Raw Blame History