diff --git a/docs/dev/query/.gitignore b/docs/dev/query/.gitignore
new file mode 100644
index 000000000..5ccff1a6b
--- /dev/null
+++ b/docs/dev/query/.gitignore
@@ -0,0 +1 @@
+html/
diff --git a/docs/dev/query/build-html b/docs/dev/query/build-html
new file mode 100755
index 000000000..ec445bc73
--- /dev/null
+++ b/docs/dev/query/build-html
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+mkdir -p $script_dir/html
+
+for markdown_file in $(find $script_dir -name '*.md'); do
+  name=$(basename -s .md $markdown_file)
+  sed -e 's/.md/.html/' $markdown_file | \
+    pandoc -s -f markdown -t html -o $script_dir/html/$name.html
+done
diff --git a/docs/dev/query/contents.md b/docs/dev/query/contents.md
new file mode 100644
index 000000000..6a3494daf
--- /dev/null
+++ b/docs/dev/query/contents.md
@@ -0,0 +1,14 @@
+# Query Parsing, Planning and Execution
+
+This part of the documentation deals with query execution.
+
+Memgraph currently supports only query interpretation. Each new query is
+parsed, analysed and translated into a sequence of operations which are then
+executed on the main database storage. Query execution is organized into the
+following phases:
+
+  1.  [Lexical Analysis (Tokenization)](parsing.md)
+  2.  [Syntactic Analysis (Parsing)](parsing.md)
+  3.  [Semantic Analysis and Symbol Generation](semantic.md)
+  4.  [Logical Planning](planning.md)
+  5.  [Logical Plan Execution](execution.md)
diff --git a/docs/dev/query/execution.md b/docs/dev/query/execution.md
new file mode 100644
index 000000000..1305572e4
--- /dev/null
+++ b/docs/dev/query/execution.md
@@ -0,0 +1,373 @@
+# Logical Plan Execution
+
+We implement classical iterator style operators. Logical operators define
+operations on database. They encapsulate the following info: what the input is
+(another `LogicalOperator`), what to do with the data, and how to do it.
+
+Currently logical operators can have zero or more input operations, and thus a
+`LogicalOperator` tree is formed. Most `LogicalOperator` types have only one
+input, so we are mostly working with chains instead of full fledged trees.
+You can find information on each operator in `src/query/plan/operator.lcp`.
+
+## Cursor
+
+Logical operators do not perform database work themselves. Instead they create
+`Cursor` objects that do the actual work, based on the info in the operator.
+Cursors expose a `Pull` method that gets called by the cursor's consumer. The
+consumer keeps pulling as long as the `Pull` returns `true` (indicating it
+successfully performed some work and might be eligible for another `Pull`).
+Most cursors will call the `Pull` function of their input provided cursor, so
+typically a cursor chain is created that is analogue to the logical operator
+chain it's created from.
+
+## Frame
+
+The `Frame` object contains all the data of the current `Pull` chain. It
+serves for communicating data between cursors.
+
+For example, in a `MATCH (n) RETURN n` query the `ScanAllCursor` places a
+vertex on the `Frame` for each `Pull`. It places it on the place reserved for
+the `n` symbol. Then the `ProduceCursor` can take that same value from the
+`Frame` because it knows the appropriate symbol. `Frame` positions are indexed
+by `Symbol` objects.
+
+## ExpressionEvaluator
+
+Expressions results are not placed on the `Frame` since they do not need to be
+communicated between different `Cursors`. Instead, expressions are evaluated
+using an instance of `ExpressionEvaluator`. Since generally speaking an
+expression can be defined by a tree of subexpressions, the
+`ExpressionEvaluator` is implemented as a tree visitor. There is a performance
+sub-optimality here because a stack is used to communicate intermediary
+expression results between elements of the tree. This is one of the reasons
+why it's planned to use `Frame` for intermediary expression results as well.
+The other reason is that it might facilitate compilation later on.
+
+## Cypher Execution Semantics
+
+Cypher query execution has *mostly* well-defined semantics. Some are
+explicitly defined by openCypher and its TCK, while others are implicitly
+defined by Neo4j's implementation of Cypher that we want to be generally
+compatible with.
+
+These semantics can in short be described as follows: a Cypher query consists
+of multiple clauses some of which modify it. Generally, every clause in the
+query, when reading it left to right, operates on a consistent state of the
+property graph, untouched by subsequent clauses. This means that a `MATCH`
+clause in the beginning operates on a graph-state in which modifications by
+the subsequent `SET` are not visible.
+
+The stated semantics feel very natural to the end-user, and Neo seems to
+implement them well. For Memgraph the situation is complex because
+`LogicalOperator` execution (through a `Cursor`) happens one `Pull` at a time
+(generally meaning all the query clauses get executed for every top-level
+`Pull`). This is not inherently consistent with Cypher semantics because a
+`SET` clause can modify data, and the `MATCH` clause that precedes it might
+see the modification in a subsequent `Pull`. Also, the `RETURN` clause might
+want to stream results to the user before all `SET` clauses have been
+executed, so the user might see some intermediate graph state. There are many
+edge-cases that Memgraph does its best to avoid to stay true to Cypher
+semantics, while at the same time using a high-performance streaming approach.
+The edge-cases are enumerated in this document along with the implementation
+details they imply.
+
+## Implementation Peculiarities
+
+### Once
+
+An operator that does nothing but whose `Cursor::Pull` returns `true` on the
+first `Pull` and `false` on subsequent ones. This operator is used when
+another operator has an optional input, because in Cypher a clause will
+typically execute once for every input from the preceding clauses, or just
+once if there was no preceding input. For example, consider the `CREATE`
+clause. In the query `CREATE (n)` only one node is created, while in the query
+`MATCH (n) CREATE (m)` a node is created for each existing node. Thus in our
+`CreateNode` logical operator the input is either a `ScanAll` operator, or a
+`Once` operator.
+
+### GraphView
+
+In the previous section, [Cypher Execution
+Semantics](#cypher-execution-semantics), we mentioned how the preceding
+clauses should not see changes made in subsequent ones. For that reason, some
+operators take a `GraphView` enum value. This value determines which state of
+the graph an operator sees.
+
+Consider the query `MATCH (n)--(m) WHERE n.x = 0 SET m.x = 1`. Naive streaming
+could match a vertex `n` on the given criteria, expand to `m`, update it's
+property, and in the next iteration consider the vertex previously matched to
+`m` and skip it because it's newly set property value does not qualify. This
+is not how Cypher works. To handle this issue properly, Memgraph designed the
+`VertexAccessor` class that tracks two versions of data: one that was visible
+before the current transaction+command, and the optional other that was
+created in the current transaction+command. The `MATCH` clause will be planned
+as `ScanAll` and `Expand` operations using `GraphView::OLD` value. This will
+ensure modifications performed in the same query do not affect it. The same
+applies to edges and the `EdgeAccessor` class.
+
+### Existing Record Detection
+
+It's possible that a pattern element has already been declared in the same
+pattern, or a preceding pattern. For example `MATCH (n)--(m), (n)--(l)` or a
+cycle-detection match `MATCH (n)-->(n) RETURN n`. Implementation-wise,
+existing record detection just checks that the expanded record is equal to the
+one already on the frame.
+
+### Why Not Use Separate Expansion Ops for Edges and Vertices?
+
+Expanding an edge and a vertex in separate ops is not feasible when matching a
+cycle in bi-directional expansions. Consider the query `MATCH (n)--(n) RETURN
+n`. Let's try to expand first the edge in one op, and vertex in the next. The
+vertex expansion consumes the edge expansion input. It takes the expanded edge
+from the frame. It needs to detect a cycle by comparing the vertex existing on
+the frame with one of the edge vertices (`from` or `to`). But which one? It
+doesn't know, and can't ensure correct cycle detection.
+
+### Data Visibility During and After SET
+
+In Cypher, setting values always works on the latest version of data (from
+preceding or current clause). That means that within a `SET` clause all the
+changes from previous clauses must be visible, as well as changes done by the
+current `SET` clause. Also, if there is a clause after `SET` it must see *all*
+the changes performed by the preceding `SET`. Both these things are best
+illustrated with the following queries executed on an empty database:
+
+    CREATE (n:A {x:0})-[:EdgeType]->(m:B {x:0})
+    MATCH (n)--(m) SET m.x = n.x + 1 RETURN labels(n), n.x, labels(m), m.x
+
+This returns:
+
++---------+---+---------+---+
+|labels(n)|n.x|labels(m)|m.x|
++:=======:+:=:+:=======:+:=:+
+|[A]      |2  |[B]      |1  |
++---------+---+---------+---+
+|[B]      |1  |[A]      |2  |
++---------+---+---------+---+
+
+The obtained result implies the following operations:
+
+  1. In the first iteration set the value of the `B.x` to 1
+  2. In the second iteration the we observe `B.x` with the value of 1 and set
+     `A.x` to 2
+  3. In `RETURN` we see all the changes made in both iterations
+
+To implement the desired behavior Memgraph utilizes two techniques. First is
+the already mentioned tracking of two versions of data in vertex accessors.
+Using this approach ensures that the second iteration in the example query
+sees the data modification performed by the preceding iteration. The second
+technique is the `Accumulate` operation that accumulates all the iterations
+from the preceding logical op before passing them to the next logical op. In
+the example query, `Accumulate` ensures that the results returned to the user
+reflect changes performed in all iterations of the query (naive streaming
+could stream results at the end of first iteration producing inconsistent
+results). Note that `Accumulate` is demanding regarding memory and slows down
+query execution. For that reason it should be used only when necessary, for
+example it does not have to be used in a query that has `MATCH` and `SET` but
+no `RETURN`.
+
+### Neo4j Inconsistency on Multiple SET Clauses
+
+Considering the preceding example it could be expected that when a query has
+multiple `SET` clauses all the changes from those preceding one are visible.
+This is not the case in Neo4j's implementation. Consider the following queries
+executed on an empty database:
+
+    CREATE (n:A {x:0})-[:EdgeType]->(m:B {x:0})
+    MATCH (n)--(m) SET n.x = n.x + 1 SET m.x = m.x * 2
+    RETURN labels(n), n.x, labels(m), m.x
+
+This returns:
+
++---------+---+---------+---+
+|labels(n)|n.x|labels(m)|m.x|
++:=======:+:=:+:=======:+:=:+
+|[A]      |2  |[B]      |1  |
++---------+---+---------+---+
+|[B]      |1  |[A]      |2  |
++---------+---+---------+---+
+
+If all the iterations of the first `SET` clause were executed before executing
+the second, all the resulting values would be 2. This not being the case, we
+conclude that Neo4j does not use a barrier-like mechanism between `SET`
+clauses.  It is Memgraph's current vision that this is inconsistent and we
+plan to reduce Neo4j compliance in favour of operation consistency.
+
+### Double Deletion
+
+It's possible to match the same graph element multiple times in a single query
+and delete it. Neo supports this, and so do we. The relevant implementation
+detail is in the `GraphDbAccessor` class, where the record deletion functions
+reside, and not in the logical plan execution. It comes down to checking if a
+record has already been deleted in the current transaction+command and not
+attempting to do it again (results in a crash).
+
+### Set + Delete Edge-case
+
+It's legal for a query to combine `SET` and `DELETE` clauses. Consider the
+following queries executed on an empty database:
+
+
+    CREATE ()-[:T]->()
+    MATCH (n)--(m) SET n.x = 42 DETACH DELETE m
+
+Due to the `MATCH` being undirected the second pull will attempt to set data
+on a deleted vertex. This is not a legal operation in Memgraph storage
+implementation. For that reason the logical operator for `SET` must check if
+the record it's trying to set something on has been deleted by the current
+transaction+command. If so, the modification is not executed.
+
+### Deletion Accumulation
+
+Sometimes it's necessary to accumulate deletions of all the matches before
+attempting to execute them. Consider this the following. Start with an empty
+database and execute queries:
+
+    CREATE ()-[:T]->()-[:T]->()
+    MATCH (a)-[r1]-(b)-[r2]-(c) DELETE r1, b, c
+
+Note that the `DELETE` clause attempts to delete node `c`, but it does not
+detach it by deleting edge `r2`. However, due to undirected edge in the
+`MATCH`, both edges get pulled and deleted.
+
+Currently Memgraph does not support this behavior, Neo does. There are a few
+ways that we could do this.
+
+ * Accumulate on deletion (that sucks because we have to keep track of
+   everything that gets returned after the deletion).
+ * Maybe we could stream through the deletion op, but defer actual deletion
+   until plan-execution end.
+ * Ignore this because it's very edgy (this is the currently selected option).
+
+### Aggregation Without Input
+
+It is necessary to define what aggregation ops return when they receive no
+input. Following is a table that shows what Neo4j's Cypher implementation and
+SQL produce.
+
+
++-------------+------------------------+---------------------+---------------------+------------------+
+|    \<OP\>   | 1. Cypher, no group-by | 2. Cypher, group-by | 3. SQL, no group-by | 4. SQL, group-by |
++=============+:======================:+:===================:+:===================:+:================:+
+| Count(\*)   | 0                      | \<NO\_ROWS>         | 0                   | \<NO\_ROWS>      |
++-------------+------------------------+---------------------+---------------------+------------------+
+| Count(prop) | 0                      | \<NO\_ROWS>         | 0                   | \<NO\_ROWS>      |
++-------------+------------------------+---------------------+---------------------+------------------+
+| Sum         | 0                      | \<NO\_ROWS>         | NULL                | \<NO\_ROWS>      |
++-------------+------------------------+---------------------+---------------------+------------------+
+| Avg         | NULL                   | \<NO\_ROWS>         | NULL                | \<NO\_ROWS>      |
++-------------+------------------------+---------------------+---------------------+------------------+
+| Min         | NULL                   | \<NO\_ROWS>         | NULL                | \<NO\_ROWS>      |
++-------------+------------------------+---------------------+---------------------+------------------+
+| Max         | NULL                   | \<NO\_ROWS>         | NULL                | \<NO\_ROWS>      |
++-------------+------------------------+---------------------+---------------------+------------------+
+| Collect     | []                     | \<NO\_ROWS>         | N/A                 | N/A              |
++-------------+------------------------+---------------------+---------------------+------------------+
+
+Where:
+
+    1. `MATCH (n) RETURN <OP>(n.prop)`
+    2. `MATCH (n) RETURN <OP>(n.prop), (n.prop2)`
+    3. `SELECT <OP>(prop) FROM Table`
+    4. `SELECT <OP>(prop), prop2 FROM Table GROUP BY prop2`
+
+Neo's Cypher implementation diverges from SQL only when performing `SUM`.
+Memgraph implements SQL-like behavior. It is considered that `SUM` of
+arbitrary elements should not be implicitly 0, especially in a property graph
+without a strict schema (the property in question can contain values of
+arbitrary types, or no values at all).
+
+### OrderBy
+
+The `OrderBy` logical operator sorts the results in the desired order. It
+occurs in Cypher as part of a `WITH` or `RETURN` clause. Both the concept and
+the implementation are straightforward. It's necessary for the logical op to
+`Pull` everything from its input so it can be sorted. It's not necessary to
+keep the whole `Frame` state of each input, it is sufficient to keep a list of
+`TypedValues` on which the results will be sorted, and another list of values
+that need to be remembered and recreated on the `Frame` when yielding.
+
+The sorting itself is made to reflect that of Neo's implementation which comes
+down to these points.
+
+  * `Null` comes last (as if it's greater than anything).
+  * Primitive types compare naturally, with no implicit casting except from
+    `int` to `double`.
+  * Complex types are not comparable.
+  * Every unsupported comparison results in an exception that gets propagated
+    to the end user.
+
+### Limit in Write Queries
+
+`Limit` can be used as part of a write query, in which case it will *not*
+reduce the amount of performed updates. For example, consider a database that
+has 10 vertices. The query `MATCH (n) SET n.x = 1 RETURN n LIMIT 3` will
+result in all vertices having their property value changed, while returning
+only the first to the client. This makes sense from the implementation
+standpoint, because `Accumulate` is planned after `SetProperty` but before
+`Produce` and `Limit` operations. Note that this behavior can be
+non-deterministic in some queries, since it relies on the order of iteration
+over nodes which is undefined when not explicitly specified.
+
+### Merge
+
+`MERGE` in Cypher attempts to match a pattern. If it already exists, it does
+nothing and subsequent clauses like `RETURN` can use the matched pattern
+elements. If the pattern can't match to any data, it creates it. For detailed
+information see Neo4j's [merge
+documentation.](https://neo4j.com/docs/developer-manual/current/cypher/clauses/merge/)
+
+An important thing about `MERGE` is visibility of modified data. `MERGE` takes
+an input (typically a `MATCH`) and has two additional *phases*: the merging
+part, and the subsequent set parts (`ON MATCH SET` and `ON CREATE SET`).
+Analysis of Neo4j's behavior indicates that each of these three phases (input,
+merge, set) does not see changes to the graph state done by subsequent phase.
+The input phase does not see data created by the merge phase, nor the set
+phase. This is consistent with what seems like the general Cypher philosophy
+that query clause effects aren't visible in the preceding clauses.
+
+We define the `Merge` logical operator as a *routing* operator that uses three
+logical operator branches.
+
+  1. The input from a preceding clause.
+
+     For example in `MATCH (n), (m) MERGE (n)-[:T]-(m)`. This input is
+     optional because `MERGE` is allowed to be the first clause in a query.
+
+  2. The `merge_match` branch.
+
+     This logical operator branch is `Pull`-ed from until exhausted for each
+     successful `Pull` from the input branch.
+
+  3. The `merge_create` branch.
+
+     This branch is `Pull`ed when the `merge_match` branch does not match
+     anything (no successful `Pull`s) for an input `Pull`. It is `Pull`ed only
+     once in such a situation, since only one creation needs to occur for a
+     failed match.
+
+The `ON MATCH SET` and `ON CREATE SET` parts of the `MERGE` clause are
+included in the `merge_match` and `merge_create` branches respectively. They
+are placed on the end of their branches so that they execute only when those
+branches succeed.
+
+Memgraph strives to be consistent with Neo in its `MERGE` implementation,
+while at the same time keeping performance as good as possible. Consistency
+with Neo w.r.t. graph state visibility is not trivial. Documentation for
+`Expand` and `Set` describe how Memgraph keeps track of both the updated
+version of an edge/vertex and the old one, as it was before the current
+transaction+command. This technique is also used in `Merge`. The input
+phase/branch of `Merge` always looks at the old data. The merge phase needs to
+see the new data so it doesn't create more data then necessary.
+
+For example, consider the query.
+
+    MATCH (p:Person) MERGE (c:City {name: p.lives_in})
+
+This query needs to create a city node only once for each unique `p.lives_in`.
+Finally the set phase of a `MERGE` clause should not affect the merge phase.
+To achieve this the `merge_match` branch of the `Merge` operator should see
+the latest created nodes, but filter them on their old state (if those nodes
+were not created by the `create_branch`).  Implementation-wise that means that
+`ScanAll` and `Expand` operators in the `merge_branch` need to look at the new
+graph state, while `Filter` operators the old, if available.
diff --git a/docs/dev/query/parsing.md b/docs/dev/query/parsing.md
new file mode 100644
index 000000000..3db3c7f49
--- /dev/null
+++ b/docs/dev/query/parsing.md
@@ -0,0 +1,62 @@
+# Lexical and Syntactic Analysis
+
+## Antlr
+
+We use Antlr for lexical and syntax analysis of Cypher queries. Antrl uses
+grammar file `Cypher.g4` downloaded from http://www.opencypher.org to generate
+the parser and the visitor for the Cypher parse tree. Even though the provided
+grammar is not very pleasant to work with we decided not to do any drastic
+changes to it so that our transition to newly published versions of
+`Cypher.g4` would be easier. Nevertheless, we had to fix some bugs and add
+features, so our version is not completely the same.
+
+In addition to using `Cypher.g4`, we have `MemgraphCypher.g4`. This grammar
+file defines Memgraph specific extensions to the original grammar. Most
+notable example is the inclusion of syntax for handling authorization. At the
+moment, some extensions are also found in `Cypher.g4`. For example, the syntax
+for using a lambda function in relationship patterns. These extensions should
+be moved out of `Cypher.g4`, so that it remains as close to the original
+grammar as possible. Additionally, having `MemgraphCypher.g4` may not be
+enough if we wish to split the functionality for community and enterprise
+editions of Memgraph.
+
+## Abstract Syntax Tree (AST)
+
+Since Antlr generated visitor and the official openCypher grammar are not very
+practical to use, we translate the Antlr's AST to our own AST. Currently there
+are ~40 types of nodes in our AST. Their definitions can be found in
+`src/query/frontend/ast/ast.lcp`.
+
+Major groups of types can be found under the following base types.
+
+  * `Expression` --- types corresponding to Cypher expressions.
+  * `Clause` --- types corresponding to Cypher clauses.
+  * `PatternAtom` --- node or edge related information.
+  * `Query` --- different kinds of queries, allows extending the language with
+    Memgraph specific query syntax.
+
+Memory management of created AST nodes is done with `AstStorage`. Each type
+must be created by invoking `AstStorage::Create` method. This way all of the
+pointers to nodes and their children are raw pointers. The only owner of
+allocated memory is the `AstStorage`. When the storage goes out of scope, the
+pointers become invalid. It may be more natural to handle tree ownership via
+`unique_ptr`, i.e. each node owns its children. But there are some benefits to
+having a custom storage and allocation scheme.
+
+The primary reason we opted for not using `unique_ptr` is the requirement of
+Antlr's base visitor class that the resulting values must by copyable. The
+result is wrapped in `antlr::Any` so that the derived visitor classes may
+return any type they wish when visiting Antlr's AST. Unfortunately,
+`antlr::Any` does not work with non-copyable types.
+
+Another benefit of having `AstStorage` is that we can easily add a different
+allocation scheme for AST nodes. The interface of node creation would not
+change.
+
+### AST Translation
+
+The translation process is done via `CypherMainVisitor` class, which is
+derived from Antlr generated visitor. Besides instancing our AST types, a
+minimal number of syntactic checks are done on a query. These checks handle
+the cases which were valid in original openCypher grammar, but may be invalid
+when combined with other syntax elements.
diff --git a/docs/dev/query/planning.md b/docs/dev/query/planning.md
new file mode 100644
index 000000000..07ac05224
--- /dev/null
+++ b/docs/dev/query/planning.md
@@ -0,0 +1,487 @@
+# Logical Planning
+
+After the semantic analysis and symbol generation, the AST is converted to a
+tree of logical operators. This conversion is called *planning* and the tree
+of logical operators is called a *plan*. The whole planning process is done in
+the following steps.
+
+  1. [AST Preprocessing](#ast-preprocessing)
+
+     The first step is to preprocess the AST by collecting
+     information on filters, divide the query into parts, normalize patterns
+     in `MATCH` clauses, etc.
+
+  2. [Logical Operator Planning](#logical-operator-planning)
+
+     After the preprocess step, the planning can be done via 2 planners:
+     `VariableStartPlanner` and `RuleBasedPlanner`. The first planner will
+     generate multiple plans where each plan has different starting points for
+     searching the patterns in `MATCH` clauses. The second planner produces a
+     single plan by mapping the query parts as they are to logical operators.
+
+  3. [Logical Plan Postprocessing](#logical-plan-postprocessing)
+
+     In this stage, we perform various transformations on the generated logical
+     plan. Here we want to optimize the operations in order to improve
+     performance during the execution. Naturally, transformations need to
+     preserve the semantic behaviour of the original plan.
+
+  4. [Cost Estimation](#cost-estimation)
+
+     After the generation, the execution cost of each plan is estimated. This
+     estimation is used to select the best plan which will be executed.
+
+  5. [Distributed Planning](#distributed-planning)
+
+     In case we are running distributed Memgraph, the final plan is adapted
+     for distributed execution. NOTE: This appears to be an error in the
+     workflow. Distributed planning should be moved before step 3. or
+     integrated with it. With the workflow ordered as is now, cost estimation
+     doesn't consider the distributed plan.
+
+The implementation can be found in the `query/plan` directory, with the public
+entry point being `query/plan/planner.hpp`.
+
+## AST Preprocessing
+
+Each openCypher query consists of at least 1 **single query**. Multiple single
+queries are chained together using a **query combinator**. Currently, there is
+only one combinator, `UNION`. The preprocessing step starts in the
+`CollectQueryParts` function. This function will take a look at each single
+query and divide it into parts. Each part is separated with `RETURN` and
+`WITH` clauses. For example:
+
+    MATCH (n) CREATE (m) WITH m MATCH (l)-[]-(m) RETURN l
+    |                          |                        |
+    |------- part 1 -----------+-------- part 2 --------|
+    |                                                   |
+    |-------------------- single query -----------------|
+
+Each part is created by collecting all `MATCH` clauses and *normalizing* their
+patterns. Pattern normalization is the process of converting an arbitrarily
+long pattern chain of nodes and edges into a list of triplets `(start node,
+edge, end node)`. The triplets should preserve the semantics of the match. For
+example:
+
+    MATCH (a)-[p]-(b)-[q]-(c)-[r]-(d)
+
+is equivalent to:
+
+    MATCH (a)-[p]-(b), (b)-[q]-(c), (c)-[r]-(d)
+
+With this representation, it becomes easier to reorder the triplets and choose
+different strategies for pattern matching.
+
+In addition to normalizing patterns, all of the filter expressions in patterns
+and inside of the `WHERE` clause (of the accompanying `MATCH`) are extracted
+and stored separately. During the extraction, symbols used in the filter
+expression are collected. This allows for planning filters in a valid order,
+as the matching for triplets is being done. Another important benefit of
+having extra information on filters, is to recognize when a database index
+could be used.
+
+After each `MATCH` is processed, they are all grouped, so that even the whole
+`MATCH` clauses may be reordered. The important thing is to remember which
+symbols were used to name edges in each `MATCH`. With those symbols we can
+plan for *cyphermorphism*, i.e. ensure different edges in the search pattern
+of a single `MATCH` map to different edges in the graph. This preserves the
+semantic of the query, even though we may have reordered the matching. The
+same steps are done for `OPTIONAL MATCH`.
+
+Another clause which needs processing is `MERGE`. Here we normalize the
+pattern, since the `MERGE` is a bit like `MATCH` and `CREATE` in one.
+
+All the other clauses are left as is.
+
+In the end, each query part consists of:
+
+  * processed and grouped `MATCH` clauses;
+  * processed and grouped `OPTIONAL MATCH` clauses;
+  * processed `MERGE` matching pattern and
+  * unchanged remaining clauses.
+
+The last stored clause is guaranteed to be either `WITH` or `RETURN`.
+
+## Logical Operator Planning
+
+### Variable Start Planner
+
+The `VariableStartPlanner` generates multiple plans for a single query. Each
+plan is generated by selecting a different starting point for pattern
+matching.
+
+The algorithm works as follows.
+
+  1. For each query part:
+     1. For each node in triplets of collected `MATCH` clauses:
+        i. Add the node to a set of `expanded` nodes
+        ii. Select a triplet `(start node, edge, end node)` whose `start node` is
+            in the `expanded` set
+        iii. If no triplet was selected, choose a new starting node that isn't in
+             `expanded` and continue expanding
+        iv. Repeat steps ii. -- iii. until all triplets have been selected
+            and store that as a variation of the `MATCH` clauses
+     2. Do step 1.1. for `OPTIONAL MATCH` and `MERGE` clauses
+     3. Take all combinations of the generated `MATCH`, `OPTIONAL MATCH` and
+        `MERGE` and store them as variations of the query part.
+  2. For each combination of query part variations:
+     1. Generate a plan using the rule based planner
+
+###  Rule Based Planner
+
+The `RuleBasedPlanner` generates a single plan for a single query. A plan is
+generated by following hardcoded rules for producing logical operators. The
+following sections are an overview on how each openCypher clause is converted
+to a `LogicalOperator`.
+
+####  MATCH
+
+`MATCH` clause is used to specify which patterns need to be searched for in
+the database. These patterns are normalized in the preprocess step to be
+represented as triplets `(start node, edge, end node)`. When there is no edge,
+then the triplet is reduced only to the `start node`. Generating the operators
+is done by looping over these triplets.
+
+##### Searching for Nodes
+
+The simplest search is finding stand alone nodes. For example, `MATCH (n)`
+will find all the nodes in the graph. This is accomplished by generating a
+`ScanAll` operator and forwarding the node symbol which should store the
+results. In this case, all the nodes will be referenced by `n`.
+
+Multiple nodes can be specified in a single match, e.g. `MATCH (n), (m)`.
+Planning is done by repeating the same steps for each sub pattern (separated
+by a comma). In this case, we would get 2 `ScanAll` operators chained one
+after the other. An optimization can be obtained if the node in the pattern is
+already searched for. In `MATCH (n), (n)` we can drop the second `ScanAll`
+operator since we have already generated it for the first node.
+
+##### Searching for Relationships
+
+A more advanced search includes finding nodes with relationships. For example,
+`MATCH (n)-[r]-(m)` should find every pair of connected nodes in the database.
+This means, that if a single node has multiple connections, it will be
+repeated for each combination of pairs. The generation of operators starts
+from the first node in the pattern. If we are referencing a new starting node,
+, we need to generate a `ScanAll` which finds all the nodes and stores them
+into `n`. Then, we generate an `Expand` operator which reads the `n` and
+traverses all the edges of that node. The edge is stored into `r`, while the
+destination node is stored in `m`.
+
+Matching multiple relationships proceeds similarly, by repeating the same
+steps. The only difference is that we need to ensure different edges in the
+search pattern, map to different edges in the graph. This means that after
+each `Expand` operator, we need to generate an `ExpandUniquenessFilter`. We
+provide this operator with a list of symbols for the previously matched edges
+and the symbol for the current edge.
+
+For example.
+
+    MATCH (n)-[r1]-(m)-[r2]-(l)
+
+The above is preprocessed into
+
+    MATCH (n)-[r1]-(m), (m)-[r2]-(l)
+
+Then we look at each triplet in order and perform the described steps. This
+way, we would generate:
+
+    ScanAll (n) > Expand (n, r1, m) > Expand (m, r2, l) >
+        ExpandUniquenessFilter ([r1], r2)
+
+Note that we don't need to make `ExpandUniquenessFilter` after the first
+`Expand`, since there are no edges to compare to. This filtering needs to work
+across multiple pattern, but inside a *single* `MATCH` clause.
+
+Let's take a look at the following.
+
+    MATCH (n)-[r1]-(m), (m)-[r2]-(l)
+
+We would also generate the exact same operators.
+
+    ScanAll (n) > Expand (n, r1, m) > Expand (m, r2, l) >
+        ExpandUniquenessFilter ([r1], r2)
+
+On the other hand,
+
+    MATCH (n)-[r1]-(m) MATCH (m)-[r2]-(l)-[r3]-(i)
+
+We would reset the uniqueness filtering at the start of the second match. This
+would mean that we output the following:
+
+    ScanAll (n) > Expand (n, r1, m) > Expand (m, r2, l) > Expand (l, r3, i) >
+        ExpandUniquenessFilter ([r2], r3)
+
+There is a difference in how we handle edge uniqueness compared to Neo4j.
+Neo4j does not allow searching for a single edge multiple times, but we've
+decided to support that.
+
+For example, the user can say the following.
+
+    MATCH (n)-[r]-(m)-[r]-l
+
+We would ensure that both `r` variables match to the same edge. In our
+terminology, we call this the *edge cycle*. For the above example, we would
+generate this plan.
+
+    ScanAll (n) > Expand (n, r, m) > Expand (m, r, l)
+
+We do not put an `ExpandUniquenessFilter` operator between 2 `Expand`
+operators and we tell the 2nd `Expand` that it is an edge cycle. This, 2nd
+`Expand` will ensure we have matched both the same edges.
+
+##### Filtering
+
+To narrow the search down, the patterns in `MATCH` can have filtered labels
+and properties. A more general filtering is done using the accompanying
+`WHERE` clause. During the preprocess step, all filters are collected and
+extracted into expressions. Additional information on which symbols are used
+is also stored. This way, each time we generate a `ScanAll` or `Expand`, we
+look at all the filters to see if any of them can be used. I.e. if the symbols
+they use have been bound by a newly produced operator. If a filter expression
+can be used, we immediately add a `Filter` operator with that expression.
+
+For example.
+
+    MATCH (n)-[r]-(m :label) WHERE n.prop = 42
+
+We would produce:
+
+    ScanAll (n) > Filter (n.prop) > Expand (n, r, m) > Filter (m :label)
+
+This means that the same plan is generated for the query:
+
+    MATCH (n {prop: 42})-[r]-(m :label)
+
+#### OPTIONAL
+
+If a `MATCH` clause is preceded by `OPTIONAL`, then we need to generate a plan
+such that we produce results even if we fail to match anything. This is
+accomplished by generating an `Optional` operator, which takes 2 operator
+trees:
+
+  * input operation and
+  * optional operation.
+
+The input is the operation we generated for the part of the query before
+`OPTIONAL MATCH`. For the optional operation, we simply generate the `OPTIONAL
+MATCH` part just like we would for regular `MATCH`. In addition to operations,
+we need to send the symbols which are set during optional matching to the
+`Optional` operator. The operator will reset values of those symbols to
+`null`, when the optional part fails to match.
+
+#### RETURN & WITH
+
+`RETURN` and `WITH` clauses are very similar to each other. The only
+difference is that `WITH` separates parts of the query and can be paired with
+`WHERE` clause.
+
+The common part is generating operators for the body of the clause. Separation
+of query parts is mostly done in semantic analysis, which checks that only the
+symbols exposed through `WITH` are visible in the query parts after the
+clause. The minor part is done in planning.
+
+##### Named Results
+
+Both clauses contain multiple named expressions (`expr AS name`) which are
+used to generate `Produce` operator.
+
+##### Aggregations
+
+If an expression contains an aggregation operator (`sum`, `avg`, ...) we need
+to plan the `Aggregate` operator as input to `Produce`. This case is more
+complex, because aggregation in openCypher can perform implicit grouping of
+results used for aggregation.
+
+For example, `WITH/RETURN sum(n.x) AS s, n.y AS group` will implicitly group
+by `n.y` expression.
+
+Another, obscure grouping can be achieved with `RETURN sum(n.a) + n.b AS s`.
+Here, the `n.b` will be used for grouping, even though both the `sum` and
+`n.b` are in the same named expression.
+
+Therefore, we need to collect all expressions which do not contain
+aggregations and use them for grouping. You may have noticed that in the last
+example `sum` is actually a sub-expression of `+`. `Aggregate` operator does
+not see that (nor it should), so the responsibility of evaluating that falls
+on `Produce`. One way is for `Aggregate` to store results of grouping
+expressions on the frame in addition to aggregation results. Unfortunately,
+this would require rewiring named expressions in `Produce` to reference
+already evaluated expressions. In the current implementation, we opted for
+`Aggregate` to store only aggregation results on the frame, while `Produce`
+will re-evaluate all the other (grouping) expressions. To handle that, symbols
+which are used in expressions are passed to `Aggregate`, so that they can be
+remembered. `Produce` will read those symbols from the frame and use it to
+re-evaluate the needed expressions.
+
+##### Accumulation
+
+After we have `Produce` and potentially `Aggregate`, we need to handle a
+special case when the part of the query before `RETURN` or `WITH` performs
+updates. For that, we want to run that part of the query fully, so that we get
+the latest results. This is accomplished by adding `Accumulate` operator as
+input to `Aggregate` or `Produce` (if there is no aggregation). Accumulation
+will store all the values for all the used symbols inside `RETURN` and `WITH`,
+so that they can be used in the operator which follows. This way, only parts
+of the frame are copied, instead of the whole frame. Here is a minor
+difference between planning `WITH`, compared to `RETURN`. Since `WITH` can
+separate writing from reading, we need to advance the transaction command.
+This enables the later, read parts of the query to obtain the newest changes.
+This is supported by passing `advance_command` flag to `Accumulate` operator.
+
+In the simplest case, common to both clauses, we have `Accumulate > Aggregate
+> Produce` operators, where `Accumulate` and `Aggregate` may be left out.
+
+##### Ordering
+
+Planning `ORDER BY` is simple enough. Since it may see new symbols (filled in
+`Produce`), we add the `OrderBy` operator at the end. The operator will change
+the order of produced results, so we pass it the ordering expressions and the
+output symbols of named expressions.
+
+##### Filtering
+
+A final difference in `WITH`, is when it contains a `WHERE` clause. For that,
+we simply generate the `Filter` operator, appended after `Produce` or
+`OrderBy` (depending which operator is last).
+
+##### Skipping and Limiting
+
+If we have `SKIP` or `LIMIT`, we generate `Skip` or `Limit` operators,
+respectively. These operators are put at the end of the clause.
+
+This placement may have some unexpected behaviour when combined with
+operations that update the graph. For example.
+
+    MATCH (n) SET n.x = n.x + 1 RETURN n LIMIT 1
+
+The above query may be interpreted as if the `SET` will be done only once.
+Since this is a write query, we need to accumulate results, so the part before
+`RETURN` will execute completely. The accumulated results will be yielded up
+to the given limit, and the user would get only the first `n` that was
+updated.  This may confuse the user because in reality, every node in the
+database had been updated.
+
+Note that `Skip` always comes before `Limit`. In the current implementation,
+they are generated directly one after the other.
+
+#### CREATE
+
+`CREATE` clause is used to create nodes and edges (relationships).
+
+For multiple `CREATE` clauses or multiple creation patterns in a single
+clause, we perform the same, following steps.
+
+##### Creating a Single Node
+
+A node is created by simply specifying a node pattern.
+
+For example `CREATE (n :label {property: "value"}), ()` would create 2 nodes.
+The 1st one would be created with a label and a property. This node could be
+referenced later in the query, by using the variable `n`. The 2nd node cannot
+be referenced and it would be created without any labels nor properties. For
+node creation, we generate a `CreateNode` operator and pass it all the details
+of node creation: variable symbol, labels and properties. In the mentioned
+example, we would have `CreateNode > CreateNode`.
+
+##### Creating a Relationship
+
+To create a relationship, the `CREATE` clause must contain a pattern with a
+directed edge. Compared to creating a single node, this case is a bit more
+complicated, because either side of the edge may not exist. By exist, we mean
+that the endpoint is a variable which already references a node.
+
+For example, `MATCH (n) CREATE (n)-[r]->(m)` would create an edge `r` and a
+node `m` for each matched node `n`. If we focus on the `CREATE` part, we
+generate `CreateExpand (n, r, m)` where `n` already exists (refers to matched
+node) and `m` would be newly created along with edge `r`. If we had only
+`CREATE (n) -[r]-> (m)`, then we would need to create both nodes of the edge
+`r`. This is done by generating `CreateNode (n) > CreateExpand(n, r, m)`.  The
+final case is when both endpoints refer to an existing node. For example, when
+adding a node with a cyclical connection `CREATE (n)-[r]->(n)`. In this case,
+we would generate `CreateNode (n) > CreateExpand (n, r, n)`. We would tell
+`CreateExpand` to only create the edge `r` between the already created `n`.
+
+#### MERGE
+
+Although the merge operation is complex, planning turns out to be relatively
+simple. The pattern inside the `MERGE` clause is used for both matching and
+creating. Therefore, we create 2 operator trees, one for each action.
+
+For example.
+
+    MERGE (n)-[r:r]-(m)
+
+We would generated a single `Merge` operator which has the following.
+
+  * No input operation (since it is not preceded by any other clause).
+
+  * On match operation
+
+    `ScanAll (n) > Expand (n, r, m) > Filter (r)`
+
+  * On create operation
+
+    `CreateNode (n) > CreateExpand (n, r, m)`
+
+In cases when `MERGE` contains `ON MATCH` and `ON CREATE` parts, we simply
+append their operations to the respective operator trees.
+
+Observe the following example.
+
+    MERGE (n)-[r:r]-(m) ON MATCH SET n.x = 42 ON CREATE SET m :label
+
+The `Merge` would be generated with the following.
+
+  * No input operation (again, since there is no clause preceding it).
+
+  * On match operation
+
+    `ScanAll (n) > Expand (n, r, m) > Filter (r) > SetProperty (n.x, 42)`
+
+  * On create operation
+
+    `CreateNode (n) > CreateExpand (n, r, m) > SetLabels (n, :label)`
+
+When we have preceding clauses, we simply put their operator as input to
+`Merge`.
+
+    MATCH (n) MERGE (n)-[r:r]-(m)
+
+The above would be generated as
+
+    ScanAll (n) > Merge (on_match_operation, on_create_operation)
+
+Here we need to be careful to recognize which symbols are already declared.
+But, since the `on_match_operation` uses the same algorithm for generating a
+`Match`, that problem is handled there. The same should hold for
+`on_create_operation`, which uses the process of generating a `Create`. So,
+finally for this example, the `Merge` would have:
+
+  * Input operation
+
+    `ScanAll (n)`
+
+  * On match operation
+
+    `Expand (n, r, m) > Filter (r)`
+
+    Note that `ScanAll` is not needed since we get the nodes from input.
+
+  * On create operation
+
+    `CreateExpand (n, r, m)`
+
+    Note that `CreateNode` is dropped, since we want to expand the existing one.
+
+## Logical Plan Postprocessing
+
+NOTE: TODO
+
+## Cost Estimation
+
+NOTE: TODO
+
+## Distributed Planning
+
+NOTE: TODO
diff --git a/docs/dev/query/semantic.md b/docs/dev/query/semantic.md
new file mode 100644
index 000000000..ff10cbb5e
--- /dev/null
+++ b/docs/dev/query/semantic.md
@@ -0,0 +1,134 @@
+# Semantic Analysis and Symbol Generation
+
+In this phase, various semantic and variable type checks are performed.
+Additionally, we generate symbols which map AST nodes to stored values
+computed from evaluated expressions.
+
+## Symbol Generation
+
+Implementation can be found in `query/frontend/semantic/symbol_generator.cpp`.
+
+Symbols are generated for each AST node that represents data that needs to
+have storage. Currently, these are:
+
+  * `NamedExpression`
+  * `CypherUnion`
+  * `Identifier`
+  * `Aggregation`
+
+You may notice that the above AST nodes may not correspond to something named
+by a user. For example, `Aggregation` can be a part of larger expression and
+thus remain unnamed. The reason we still generate symbols is to have a uniform
+behaviour when executing a query as well as allow for caching the results of
+expression evaluation.
+
+AST nodes do not actually store a `Symbol` instance, instead they have a
+`int32_t` index identifying the symbol in the `SymbolTable` class. This is
+done to minimize the size of AST types as well as allow easier sharing of same
+symbols with multiple instances of AST nodes.
+
+The storage for evaluated data is represented by the `Frame` class. Each
+symbol determines a unique position in the frame. During interpretation,
+evaluation of expressions which have a symbol will either read or store values
+in the frame. For example, instance of an `Identifier` will use the symbol to
+find and read the value from `Frame`. On the other hand, `NamedExpression`
+will take the result of evaluating its own expression and store it in the
+`Frame`.
+
+When a symbol is created, context of creation is used to assign a type to that
+symbol. This type is used for simple type checking operations. For example,
+`MATCH (n)` will create a symbol for variable `n`. Since the `MATCH (n)`
+represents finding a vertex in the graph, we can set `Symbol::Type::Vertex`
+for that symbol. Later, for example in `MATCH ()-[n]-()` we see that variable
+`n` is used as an edge. Since we already have a symbol for that variable, we
+detect this type mismatch and raise a `SemanticException`.
+
+Basic rule of symbol generation, is that variables inside `MATCH`, `CREATE`,
+`MERGE`, `WITH ... AS` and `RETURN ... AS` clauses establish new symbols.
+
+### Symbols in Patterns
+
+Inside `MATCH`, symbols are created only if they didn't exist before. For
+example, patterns in `MATCH (n {a: 5})--(m {b: 5}) RETURN n, m` will create 2
+symbols: one for `n` and one for `m`. `RETURN` clause will, in turn, reference
+those symbols. Symbols established in a part of pattern are immediately bound
+and visible in later parts. For example, `MATCH (n)--(n)` will create a symbol
+for variable `n` for 1st `(n)`. That symbol is referenced in 2nd `(n)`. Note
+that the symbol is not bound inside 1st `(n)` itself. What this means is that,
+for example, `MATCH (n {a: n.b})` should raise an error, because `n` is not
+yet bound when encountering `n.b`. On the other hand,
+`MATCH (n)--(n {a: n.b})` is fine.
+
+The `CREATE` is similar to `MATCH`, but it *always* establishes symbols for
+variables which create graph elements. What this means is that, for example
+`MATCH (n) CREATE (n)` is not allowed. `CREATE` wants to create a new node,
+for which we already have a symbol. In such a case, we need to throw an error
+that the variable `n` is being redeclared. On the other hand `MATCH (n) CREATE
+(n)-[r :r]->(n)` is fine, because `CREATE` will only create the edge `r`,
+connecting the already existing node `n`. Remaining behaviour is the same as
+in `MATCH`. This means that we can simplify `CREATE` to be like `MATCH` with 2
+special cases.
+
+  1. Are we creating a node, i.e. `CREATE (n)`? If yes, then the symbol for
+     `n` must not have been created before. Otherwise, we reference the
+     existing symbol.
+  2. Are we creating an edge, i.e. we encounter a variable for an edge inside
+     `CREATE`? If yes, then that variable must not reference a symbol.
+
+The `MERGE` clause is treated the same as `CREATE` with regards to symbol
+generation. The only difference is that we allow bidirectional edges in the
+pattern. When creating such a pattern, the direction of the created edge is
+arbitrarily determined.
+
+### Symbols in WITH and RETURN
+
+In addition to patterns, new symbols are established in the `WITH` clause.
+This clause makes the new symbols visible *only* to the rest of the query.
+For example, `MATCH (old) WITH old AS new RETURN new, old` should raise an
+error that `old` is unbound inside `RETURN`.
+
+There is a special case with symbol visibility in `WHERE` and `ORDER BY`. They
+need to see both the old and the new symbols. Therefore `MATCH (old) RETURN
+old AS new ORDER BY old.prop` needs to work. On the other hand, if we perform
+aggregations inside `WITH` or `RETURN`, then the old symbols should not be
+visible neither in `WHERE` nor in `ORDER BY`. Since the aggregation has to go
+through all the results in order to generate the final value, it makes no
+sense to store old symbols and their values. A query like `MATCH (old) WITH
+SUM(old.prop) AS sum WHERE old.prop = 42 RETURN sum` needs to raise an error
+that `old` is unbound inside `WHERE`.
+
+For cases when `SKIP` and `LIMIT` appear, we disallow any identifiers from
+appearing in their expressions. Basically, `SKIP` and `LIMIT` can only be
+constant expressions[^1]. For example, `MATCH (old) RETURN old AS new SKIP
+new.prop` needs to raise that variables are not allowed in `SKIP`. It makes no
+sense to allow variables, since their values may vary on each iteration. On
+the other hand, we could support variables to constant expressions, but for
+simplicity we do not. For example, `MATCH (old) RETURN old, 2 AS limit_var
+LIMIT limit_var` would still throw an error.
+
+Finally, we generate symbols for names created in `RETURN` clause. These
+symbols are used for the final results of a query.
+
+NOTE: New symbols in `WITH` and `RETURN` should be unique. This means that
+`WITH a AS same, b AS same` is not allowed, neither is a construct like
+`RETURN 2, 2`
+
+### Symbols in Functions which Establish New Scope
+
+Symbols can also be created in some functions. These functions usually take an
+expression, bind a single variable and run the expression inside the newly
+established scope.
+
+The `all` function takes a list, creates a variable for list element and runs
+the predicate expression. For example:
+
+    MATCH (n) RETURN n, all(n IN n.prop_list WHERE n < 42)
+
+We create a new symbol for use inside `all`, this means that the `WHERE n <
+42` uses the `n` which takes values from a `n.prop_list` elements. The
+original `n` bound by `MATCH` is not visible inside the `all` function, but it
+is visible outside. Therefore, the `RETURN n` and `n.prop_list` reference the
+`n` from `MATCH`.
+
+[^1]: Constant expressions are expressions for which the result can be
+  computed at compile time.