Summary:
This diff splits single node and distributed storage from each other.
Currently all of the storage code is copied into two directories (one single
node, one distributed). The logic used in the storage implementation isn't
touched, it will be refactored in following diffs.
To clean the working directory after this diff you should execute:
```
rm database/state_delta.capnp
rm database/state_delta.hpp
rm storage/concurrent_id_mapper_rpc_messages.capnp
rm storage/concurrent_id_mapper_rpc_messages.hpp
```
Reviewers: teon.banek, buda, msantl
Reviewed By: teon.banek, msantl
Subscribers: teon.banek, pullbot
Differential Revision: https://phabricator.memgraph.io/D1625
Summary:
This diff changes the RPC layer to directly return `TResponse` to the user when
issuing a `Call<...>` RPC call. The call throws an exception on failure
(instead of the previous return `nullopt`).
All servers (network, RPC and distributed) are set to have explicit `Shutdown`
methods so that a controlled shutdown can always be performed. The object
destructors now have `CHECK`s to enforce that the `AwaitShutdown` methods were
called.
The distributed memgraph is changed that none of the binaries (master/workers)
crash when there is a communication failure. Instead, the whole cluster starts
a graceful shutdown when a persistent communication error is detected.
Transient errors are allowed during execution. The transaction that errored out
will be aborted on the whole cluster. The cluster state is managed using a new
Heartbeat RPC call.
Reviewers: buda, teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1604
Summary:
This diff introduces a new flags
* `--synchronous-commit`
The `--synchronous-commit` tells the WAL when should the deltas be flushed to
the disk drive. By default this is off and the WAL flushes deltas every `N`
milliseconds. If it's turned on, on every transaction end, commit or abort, the
WAL will first flush the deltas and only after that will return from ending a
transaction.
Reviewers: buda, vkasljevic, mferencevic, teon.banek, ipaljak
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1542
Summary:
GraphDbAccessor is now constructed only through GraphDb. This allows the
concrete GraphDb to instantiate a concrete GraphDbAccessor. This allows
us to use virtual calls, so that the implementation may be kept
separate. The major downside of doing things this way is heap allocation
of GraphDbAccessor. In case it turns out to be a real performance
issues, another solution with pointer to static implementation may be
used.
InsertVertexIntoRemote is now a non-member function, which reduces
coupling. It made no sense for it to be member function because it used
only the public parts of GraphDbAccessor.
Reviewers: msantl, mtomic, mferencevic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1504
Summary:
This change, hopefully, simplifies the implementation of different kinds
of GraphDb. The pimpl idiom is now simplified by removing all of the
crazy inheritance. Implementations classes are just plain data stores,
without any methods. The interface classes now have a more flat
hierarchy:
```
GraphDb (pure interface)
|
+----+---------- DistributedGraphDb (pure interface)
| |
Single Node +-----+------+
| |
Master Worker
```
DistributedGraphDb is used as an intermediate interface for all the
things that should work only in distributed. Therefore, virtual calls
for distributed stuff have been removed from GraphDb. Some are exposed
via DistributedGraphDb, other's are only in concrete Master and Worker
classes. The code which relied on those virtual calls has been
refactored to either use DistributedGraphDb, take a pointer to what is
actually needed or use dynamic_cast. Obviously, dynamic_cast is a
temporary solution and should be replaced with another mechanism (e.g.
virtual call, or some other function pointer style).
The cost of the above change is some code duplication in constructors
and destructors of classes. This duplication has a lot of little tweaks
that make it hard to generalize, not to mention that virtual calls do
not work in constructor and destructor. If we really care about
generalizing this, we should think about abandoning RAII in favor of
constructor + Init method.
The next steps for splitting the dependencies that seem logical are:
1) Split GraphDbAccessor implementation, either via inheritance or
passing in an implementation pointer. GraphDbAccessor should then
only be created by a virtual call on GraphDb.
2) Split Interpreter implementation. Besides allowing single node
interpreter to exist without depending on distributed, this will
enable the planner and operators to be correctly separated.
Reviewers: msantl, mferencevic, ipaljak
Reviewed By: msantl
Subscribers: dgleich, pullbot
Differential Revision: https://phabricator.memgraph.io/D1493
Summary:
First iteration in implementing kafka.
Currently, memgraph streams won't use the transform script provided in the
`CREATE STREAM` query.
There is a manual test that serves a POC purpose which we'll use to fully wire
kafka in memgraph.
Since streams need to download the script, I moved curl init from
telemetry.
Reviewers: teon.banek, mferencevic
Reviewed By: mferencevic
Subscribers: ipaljak, pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1491
Summary:
Integrated kafka library into memgraph. This version supports all opencypher
features and will only output messages consumed from kafka.
Depends on D1434
Next steps are persisting stream metadata and transforming messages in order to
store them in the graph.
Reviewers: teon.banek, mtomic, mferencevic, buda
Reviewed By: teon.banek
Subscribers: mferencevic, pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1466
Summary:
A simplified end-to-end implementation
POD interface set-up, still have bugs with HDDkeys
Version bug fix and first iterator implementation
Fixed out-of-scope reference in PVS iterator
Added PVS unit tests
Reviewers: buda, mferencevic, dgleich, teon.banek
Reviewed By: buda, dgleich
Subscribers: mferencevic, pullbot
Differential Revision: https://phabricator.memgraph.io/D1369
Summary:
This binary is installed and packaged for release. This is just a quick
solution for releasing the Community 0.10 version. We still need to
setup the installation and packaging for both the Enterprise and
Community versions. Additionally, the automated build system needs to
test both binaries for correct behaviour. Obviously, some tests can only
be run on one of the 2 versions.
Reviewers: mferencevic, buda
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1363
Summary: Make synchronized snapshot. This invokese the snapshooter on workers on the master snapshot scheduler interval.
Reviewers: msantl, mtomic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1334
Summary: Adds a commit log garbage collector, which clears old transactions from the commit log
Reviewers: florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1310
Summary:
Remove "produce_" and "Produce" as prefix from all distributed stuff.
It's not removed in src/query/ stuff (operators).
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1315
Summary:
I have not thoroughly thought this through, especially the worker
destruction (is it legit to abort all running tx?), but it's tested to
abort during remote pull, what we need.
Also I improved error handling for vertex deletion failure during
remote pull (@dgleich).
Reviewers: teon.banek, msantl, dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1263
Summary:
Instead of passing `coordination`, pass `rpc_worker_clients` that
holds a map of worker_id->clientPool. By having only one instance of
`RpcWorkerClients` that is owned by `GraphDB` and passing it by refference
we'll share the same client pools for rpc clients.
Reviewers: teon.banek, florijan, dgleich, mferencevic
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1261
Summary:
Previously, the RPC stack used the network stack only to receive messages. The
messages were then added to a separate queue that was processed by different
thread pools. This design was inefficient because there was a lock when
inserting and getting messages from the common queue.
This diff removes the need for separate thread pools by utilising the new
network stack design. This is possible because the new network stack allows
full processing of the network request without blocking the whole queue.
Reviewers: buda, florijan, teon.banek, dgleich, mislav.bradac
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1229
Summary:
- Add `database::GraphDb::GetWorkerIds()`
- Change `CreateNode` constructor API
- Make `CreateNode` distribute nodes uniformly over workers
Did not yet modify `CreateExpand`, coming in the next diff.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1216
Summary:
Remote caches used to be owned by `GraphDbAccessor`. An advantage of
that was immediate cleanup when destructing. A disadvantage was sharing
the remote cache between mutliple program-flows in the same transaction
in distributed (one would have to share the accessor).
We will have to do post-transactional global cleanup anyway, since we
leak, which reduces the above stated advantage. And the stated
disadvantage is becoming more and more pronounced as additional
components need access to the remote cache.
Hence the refactor.
Reviewers: buda, teon.banek, msantl
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1186
Summary:
Updates are supported, insertions and removals not in this diff. The
test is a bit overdesigned, it happens.
Reviewers: teon.banek, dgleich, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1176
Summary:
Change `GraphDb` so it exposes index clients in the same
convention as other members.
Reviewers: dgleich, mculinovic
Reviewed By: mculinovic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1159
Summary:
NOTE: This diff is still in progress. Many TODOs, lacking documentation
etc. But the main logic is there (some could be different), and it tests
out OK.
Reviewers: teon.banek, msantl, buda
Reviewed By: teon.banek
Subscribers: mferencevic, pullbot
Differential Revision: https://phabricator.memgraph.io/D1138
Summary:
- End to end distributed GraphDb testing
- Refactors as necessary
- Basic RemoteCache for storing remote data
- RemoteDataRpc
As we are on a tight schedule, please let's focus on the essentials:
functionality and proper testing.
Reviewers: dgleich, teon.banek, buda
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1121
Summary:
A slight insanity here... I realized I will need to create
`GraphDbAccessor` instance (which need `&GraphDb`) within some members
of `::impl` classes. Within those classes I can pass `this` to those
members, if `this` is a valid `GraphDb`. Semantically it really is (at
the moment), but heirarchically it wasn't. This diff changes that.
`GraphDb` is now only an interface. `PublicBase` is the base for all
the public classes, `PrivateBase` for the `::impl` classes. Seems to
work.
Oh yes, another thing to keep in mind when doing this is that I should avoid
calling virtual functions in public classes (the motivation for the double
heirarchy). Before this diff the getters weren't virtual, now they are, so
I should have made all the appropriate changes in code as well.
Buda, was this a task I could have delegated to you or Cula?
Reviewers: teon.banek, dgleich, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1120
Summary:
Adds worker id to snapshot and wal filename.
Adds a new worker_id flag to be used for recovering a worker with a distributed snapshot.
Adds worker_id field to snapshot to check for consistency.
Reviewers: florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1096
Summary:
GraphDb is refactored to become an API exposing different parts
necessary for the database to function. These different parts can have
different implementations in SingleNode or distributed Master/Server
GraphDb implementations.
Interally GraphDb is implemented using two class heirarchies. One
contains all the members and correct wiring for each situation. The
other takes care of initialization and shutdown. This architecture is
practical because it can guarantee that the initialization of the
object structure is complete, before initializing state.
Reviewers: buda, mislav.bradac, dgleich, teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1093
Summary:
No logic changes, just split `tx::MasterEngine` into
`tx::SingleNodeEngine` and `tx::MasterEngine`. This gives better
responsibility separation and is more appropriate now there is no
Start/Shutdown.
Reviewers: dgleich, teon.banek, buda
Reviewed By: dgleich, teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1099
Summary:
The distributed ID mapper is not yet utilised in GraphDb as those
changes are in D1060. Depending on landing order it will be added.
Reviewers: dgleich, mislav.bradac
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1064
Summary:
The current idea is that the same MG binary can be used for single-node,
distributed master and distributed worker. The transactional engine in
the single-node and distributed master is the same: it determines the
transactional time and exposes all the "global" functionalities. In the
distributed worker the "global" functions must contact the master.
Reviewers: dgleich, mislav.bradac, buda
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1013
Summary: Because it will never be used, we already have replacements for it.
Reviewers: buda
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1016
Summary:
In preparation for distributed storage we need to have labels/properties/edgetypes uniquely identifiable by their ids, which will be global in near future.
The old design has to be abandoned because it's not possible to keep track of global labels/properties/edgetypes while they are local pointers.
Reviewers: mislav.bradac, florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D993
Summary:
My dear fellow Memgraphians. It's friday afternoon, and I am as ready to pop as WAL is to get reviewed...
What's done:
- Vertices and Edges have global IDs, stored in `VersionList`. Main storage is now a concurrent map ID->vlist_ptr.
- WriteAheadLog class added. It's based around buffering WAL::Op objects (elementraly DB changes) and periodically serializing and flusing them to disk.
- Snapshot recovery refactored, WAL recovery added. Snapshot format changed again to include necessary info.
- Durability testing completely reworked.
What's not done (and should be when we decide how):
- Old WAL file purging.
- Config refactor (naming and organization). Will do when we discuss what we want.
- Changelog and new feature documentation (both depending on the point above).
- Better error handling and recovery feedback. Currently it's all returning bools, which is not fine-grained enough (neither for errors nor partial successes, also EOF is reported as a failure at the moment).
- Moving the implementation of WAL stuff to .cpp where possible.
- Not sure if there are transactions being created outside of `GraphDbAccessor` and it's `BuildIndex`. Need to look into.
- True write-ahead logic (flag controlled): not committing a DB transaction if the WAL has not flushed it's data. We can discuss the gain/effort ratio for this feature.
Reviewers: buda, mislav.bradac, teon.banek, dgleich
Reviewed By: dgleich
Subscribers: mtomic, pullbot
Differential Revision: https://phabricator.memgraph.io/D958