Update docs/dev/query/planning.md

Reviewers: buda Reviewed By: buda Differential Revision: https://phabricator.memgraph.io/D2691
2020-02-27 10:33:13 +01:00 · 2020-02-27 10:33:13 +01:00 · 565927631f
commit 565927631f
parent 64da28ca83
1 changed files with 51 additions and 12 deletions
--- a/docs/dev/query/planning.md
+++ b/docs/dev/query/planning.md
@ -31,14 +31,6 @@ the following steps.
     After the generation, the execution cost of each plan is estimated. This
     estimation is used to select the best plan which will be executed.

-  5. [Distributed Planning](#distributed-planning)
-
-     In case we are running distributed Memgraph, the final plan is adapted
-     for distributed execution. NOTE: This appears to be an error in the
-     workflow. Distributed planning should be moved before step 3. or
-     integrated with it. With the workflow ordered as is now, cost estimation
-     doesn't consider the distributed plan.
-
 The implementation can be found in the `query/plan` directory, with the public
 entry point being `query/plan/planner.hpp`.

@ -476,12 +468,59 @@ finally for this example, the `Merge` would have:

 ## Logical Plan Postprocessing

-NOTE: TODO
+Postprocessing of a logical plan is done by rewriting the original plan into
+a more efficient one while preserving the original semantic of operations.
+The rewriters are found in `query/plan/rewrite` directory, and currently we
+only have one -- `IndexLookupRewriter`.
+
+### IndexLookupRewriter
+
+The job of this rewriter is to merge `Filter` and `ScanAll` operations into
+equivalent `ScanAllBy<Index>` operations. In almost all cases using indexed
+lookup will be faster than regular lookup, so `IndexLookupRewriter` simply
+does the transformations whenever possible. The simplest case being the
+following, assuming we have an index over `id`.
+
+  * Original Plan
+
+    `ScanAll (n) > Filter (id(n) == 42) > Produce (n)`
+
+  * Rewritten Plan
+
+    `ScanAllById (n, id=42) > Produce (n)`
+
+Naturally, there are some cases we need to be careful about.
+
+  1. Operators with Multiple Branches
+
+     Here we may not carry `Filter` operations outside of the operator into
+     its branches, so the branches are rewritten as stand alone plans with a
+     branch new `IndexLookupRewriter`. Some of the operators with multiple
+     branches are `Merge`, `Optional`, `Cartesian` and `Union`.
+
+  2. Expand Operators
+
+     Expand operations aren't that tricky to handle, but they have a special
+     case where we want to use an indexed lookup of the destination so that the
+     expansion is performed between known nodes. This decision may depend on
+     various parameters which may need further tweaking as we encounter more
+     use-cases of Cypher queries.

 ## Cost Estimation

-NOTE: TODO
+Cost estimation is the final step of processing a logical plan. The
+implementation can be found in `query/plan/cost_estimator.hpp`. We give each
+operator a cost based on the estimated cardinality of results of that operator
+and on the preset coefficient of the runtime performance of that operator.

-## Distributed Planning
+This scheme is rather simple and works quite well, but there are couple of
+improvements we may want to do at some point.

-NOTE: TODO
+  * Track more information about the stored graph and use that to improve the
+    estimates.
+  * Do a quick, partial run of the plan and tweak the estimation based on how
+    much each operator produced results. This may require us having some kind
+    of representative subset of the stored graph.
+  * Write micro benchmarks for each operator and based on the results create
+    sensible preset coefficients. This would replace the current coefficients
+    which are just assumptions on how each operator implementation performs.