Summary:
First iteration in implementing kafka.
Currently, memgraph streams won't use the transform script provided in the
`CREATE STREAM` query.
There is a manual test that serves a POC purpose which we'll use to fully wire
kafka in memgraph.
Since streams need to download the script, I moved curl init from
telemetry.
Reviewers: teon.banek, mferencevic
Reviewed By: mferencevic
Subscribers: ipaljak, pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1491
Summary:
This change should preclude the need to specify `:capnp-save` and
`:capnp-load` functions for regularly saved elements of `std::vector<T>`
and `std::optional<T>`. Regular saving in this context means saving
primitive types or compound types which have a `Save(capnp::Builder *)`
method.
Reviewers: mtomic, msantl
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1497
Summary:
This change should completely support planning Optional for distributed
execution. Cartesian matching is handled, as well as dependencies
between optional branch and the main input branch.
Unit tests are expanded to cover the planning algorithm.
Reviewers: msantl, mtomic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1480
Summary:
Added test stream functionality. When a stream is configured, it will try to
consume messages from a kafka topic and return them back to the user.
For now, the messages aren't transformed, so it just returns the payload string.
Depends on D1466
Next steps are persisting stream metadata and transforming messages in order to
store them in the graph.
Reviewers: teon.banek, mtomic
Reviewed By: teon.banek
Subscribers: pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1474
Summary:
Integrated kafka library into memgraph. This version supports all opencypher
features and will only output messages consumed from kafka.
Depends on D1434
Next steps are persisting stream metadata and transforming messages in order to
store them in the graph.
Reviewers: teon.banek, mtomic, mferencevic, buda
Reviewed By: teon.banek
Subscribers: mferencevic, pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1466
Summary: When doing bfs with given endpoint, we can stop the traversal on first successful pull.
Reviewers: teon.banek, msantl, mculinovic, buda
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1469
Summary:
Added basic functionality for kafka streams. The `CREATE STREAM` clause is a
simplified version from the one mentioned in D1415 so we can start testing
end-to-end sooner.
This diff also includes a bug fix in `lcp.list ` for operators that have no
members.
Reviewers: teon.banek, mtomic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1434
Summary:
Hopefully, the mechanism of generating Cartesian is general enough, so
this simple change should work correctly in all cases.
Planner tests have been modified to use a FakeDbAccessor in order to
speed them up and potentially allow extracting planning into a library.
Reviewers: msantl, mtomic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1431
Summary:
This change should correctly plan Cartesian which have dependent Filter
or Expand operators. Tests have been added for those cases. Other cases
are not yet supported and should throw an exception.
Reviewers: msantl, mtomic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1426
Summary:
This is the initial step to getting a correct version of distributed
planning of Cartesian operator. Functions and structs have been added
which should collect enough information to correctly order the execution
with regards to dependencies among Cartesian branches. The support
functionality should be the same as was before, but unsupported cases
should now raise an exception instead of leading to undefined behaviour.
Reviewers: msantl, mtomic, buda
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1418
Summary:
Converts the RPC stack to use Cap'n Proto for serialization instead of
boost. There are still some traces of boost in other places in the code,
but most of it is removed. A future diff should cleanup boost for good.
The RPC API is now changed to be more flexible with regards to how
serialize data. This makes the simplest cases a bit more verbose, but
allows complex serialization code to be correctly written instead of
relying on hacks. (For reference, look for the old serialization of
`PullRpc` which had a nasty pointer hacks to inject accessors in
`TypedValue`.)
Since RPC messages were uselessly modeled via inheritance of Message
base class, that class is now removed. Furthermore, that approach
doesn't really work with Cap'n Proto. Instead, each message type is
required to have some type information. This can be automated, so
`define-rpc` has been added to LCP, which hopefully simplifies defining
new RPC request and response messages.
Specify Cap'n Proto schema ID in cmake
This preserves Cap'n Proto generated typeIds across multiple generations
of capnp schemas through LCP. It is imperative that typeId stays the
same to ensure that different compilations of Memgraph may communicate
via RPC in a distributed cluster.
Use CLOS for meta information on C++ types in LCP
Since some structure slots and functions have started to repeat
themselves, it makes sense to model C++ meta information via Common Lisp
Object System.
Depends on D1391
Reviewers: buda, dgleich, mferencevic, mtomic, mculinovic, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1407
Summary:
Add additional structs and functions for handling C++ meta information.
Add capnp-file and capnp-id arguments to lcp:process-file.
Generate cpp along with hpp and capnp in lcp.
Wrap LogicalOperator base class in lcp:define-class.
Modify logical operators for capnp serialization.
Add query/common.capnp.
Reviewers: mculinovic, buda, mtomic, msantl, ipaljak, dgleich, mferencevic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1391
Summary:
Command id is necessary in remote produce to identify an ongoing pull
because a transaction can have multiple commands that all belong under
the same plan and tx id.
Reviewers: teon.banek, mtomic, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1386
Summary:
Removing `AS_IS` from GraphView because it doesn't seem like it is necessary for query execution and it also has weird semantics (you might get a mix of old and new records). `Unwind`, `OrderBy` and `PullRemoteOrderBy` now use `OLD` graph view.
Remove AS_IS from GraphView
Fix query_cost_estimator tests
Fix query_expression_evaluator tests
Fix query_plan_match_filter_return tests
Fix query_plan_create_set_remove_delete tests
Fix query_plan_accumulate_aggregate tests
Reviewers: teon.banek, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1390
Summary:
Since we are moving from boost to Capnp for serialization, it makes
sense to keep all of the LogicalOperator classes in LCP format. This
will make it easier to generate Capnp code.
Depends on D1361
Reviewers: buda, mferencevic, msantl, dgleich, ipaljak, mculinovic, mtomic
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1362
Summary:
Added a new markdown file for concepts and a new test to test the edge
case with upper bound.
Reviewers: teon.banek, mtomic, dgleich, buda, ipaljak
Reviewed By: teon.banek, dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1366
Summary:
When commiting/aborting a transaction in tx master engine, make a two
phase commit to all workers so they can stop all futures and clear
transactional cache.
Reviewers: dgleich, florijan
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1320
Summary:
Before we used `utils::Future` only where it's created by our `ThreadPool`.
I suggest in this diff that we use it everywhere, it's a bit more defensive and
should not have any downsides.
Reviewers: msantl, teon.banek
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1316
Summary:
Add different priority VLOGs for distributed memgraph.
For level 3 you'll get logs for dispatching/consuming plans.
For level 4 you'll get logs for tx start/commit/abort, remote produce, remote
pull, remote result consume,
For level 5 there will be a log for each request/response made by the RPC
client.
Master log snippet P9
Worker log snippet P10
Reviewers: florijan, teon.banek
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1296
Summary:
Remove "produce_" and "Produce" as prefix from all distributed stuff.
It's not removed in src/query/ stuff (operators).
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1315
Summary:
This fixes a bug where the planning would raise NotYetImplemented error,
due to preventing plan splitting while the operator is already on
master. For example, `MATCH (a), (b) CREATE (a)-[e:r]->(b) RETURN e`
would split the plan after Cartesian and before Create. Thus, the rest
of the plan would be on master, including the Accumulate before Produce.
We still can (and must) replace Accumulate with Synchronize but this
would fail due to unneeded check.
Reviewers: florijan, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1272
Summary:
I have not thoroughly thought this through, especially the worker
destruction (is it legit to abort all running tx?), but it's tested to
abort during remote pull, what we need.
Also I improved error handling for vertex deletion failure during
remote pull (@dgleich).
Reviewers: teon.banek, msantl, dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1263
Summary:
If the `db` says we need to abort we should abort. There are two
places (in both `PullRemote` and `PullRemoteOrderBy`) where the check is made.
The first one is on the very beginning of the `Pull` method and the second one
is in the loop that checks/waits for remote results.
Reviewers: teon.banek, florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1265
Summary:
Usually a single RPC call takes 50us, which is a lot lower than our
millisecond sleep. Calling sleep with less than a ms duration may not
be really supported (due to timer granularity), but this change should
cause some sub-ms sleep.
Reviewers: florijan, msantl, mferencevic
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1258
Summary:
During distributed execution, OrderBy is split across workers and the
master gets to merge those results via PullRemoteOrderBy. Since this
operator may be an input to almost any other operator, virtual accessors
to `input` have been added in LogicalOperator.
Depends on D1221
Reviewers: florijan, msantl, buda
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1232
Summary:
Extracted `TypedValueVectorCompare` and `RemotePuller` from operators
so it can be reused.
The new `PullRemoteOrerBy` operator pulls one result from each worker and one
from master, relies on the fact that workers/master returned sorted results,
returns the next one in order, and pulls the source of that result to get the
next one.
Depends on D1215 that (at the moment) is still in review.
Reviewers: florijan, teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1221
Summary:
- The expansion vertex gets created on the origin's worker
- The edge automatically gets created wherever necessary
- Vertex creation logic reuse
- End to end test
Reviewers: teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1227
Summary:
This may improve the query execution workload, because the Expand will
yield local results while it waits for remote ones. Note that we rely on
the fact that walking the graph produces results without any predetermined
order. Therefore, we can yield paths as we see fit. Enforcing the order
is done through OrderBy operator, and this change shouldn't affect that.
Reviewers: florijan, msantl, buda
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1215
Summary:
- Add `database::GraphDb::GetWorkerIds()`
- Change `CreateNode` constructor API
- Make `CreateNode` distribute nodes uniformly over workers
Did not yet modify `CreateExpand`, coming in the next diff.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1216
Summary:
This is still very much in progress. No advanced checks are done to
prevent planning unimplemented things. Basic Cartesian product should
work, for example `MATCH (a), (b) CREATE (a)-[:r]->(c)-[:r]->(b)`. But
anything more advanced may lead to undefined behaviour of the planner
and therefore execution. Use at your own risk!
Add ModifiedSymbols method to LogicalOperator
For planning Cartesian, we need information on which symbols are filled
by operator sub-trees. Currently, this is used to set symbols which
should be transferred over network. Later, they should be used to detect
whether filter expressions use symbols modified from Cartesian branches.
Then we will be able to ensure correct dependency of filters and their
behaviour.
Prepare DistributedPlan for multiple worker plans
Since Cartesian branches need to be split and handled by each worker, we
now dispatch multiple plans to workers.
Reviewers: florijan, msantl, buda
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1208