Summary:
This should fix the issue of having a write followed by a read during
distributed execution. Any kind of merger of plans should behave like a
Cartesian with regards to planning the following ScanAll.
Reviewers: mtomic, mferencevic
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1624
Summary:
This diff splits single node and distributed storage from each other.
Currently all of the storage code is copied into two directories (one single
node, one distributed). The logic used in the storage implementation isn't
touched, it will be refactored in following diffs.
To clean the working directory after this diff you should execute:
```
rm database/state_delta.capnp
rm database/state_delta.hpp
rm storage/concurrent_id_mapper_rpc_messages.capnp
rm storage/concurrent_id_mapper_rpc_messages.hpp
```
Reviewers: teon.banek, buda, msantl
Reviewed By: teon.banek, msantl
Subscribers: teon.banek, pullbot
Differential Revision: https://phabricator.memgraph.io/D1625
Summary:
TODO:
~~1. Figure out how to propagate exceptions during lambda evaluation to master.~~
~~2. Make some more complicated test cases to see if everything is~~
~~sent over the network properly (lambdas depending on frame, evaluation context).~~
~~3. Support only `GraphView::OLD`.~~
4. [MAYBE] Send only parts of the frame necessary for lambda evaluation.
~~5. Fix EdgeType handling~~
--------------------
Serialize frame and send it in PrepareForExpand RPC
Move Lambda out of ExpandVariable
Send symbol table and filter lambda in CreateBfsSubcursor RPC
Evaluate filter lambda during the expansion
Send evaluation context in CreateBfsSubcursor RPC
Reviewers: teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1600
Summary:
This should allow us to more easily decouple the code which should be
open sourced. Unfortunately, the downside of this approach is that we
cannot rely on virtual calls to dispatch the serialization to correct
type. Another downside is that members need to be publicly accessible
for serialization.
Reviewers: mtomic, msantl
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1596
Summary:
This diff changes the RPC layer to directly return `TResponse` to the user when
issuing a `Call<...>` RPC call. The call throws an exception on failure
(instead of the previous return `nullopt`).
All servers (network, RPC and distributed) are set to have explicit `Shutdown`
methods so that a controlled shutdown can always be performed. The object
destructors now have `CHECK`s to enforce that the `AwaitShutdown` methods were
called.
The distributed memgraph is changed that none of the binaries (master/workers)
crash when there is a communication failure. Instead, the whole cluster starts
a graceful shutdown when a persistent communication error is detected.
Transient errors are allowed during execution. The transaction that errored out
will be aborted on the whole cluster. The cluster state is managed using a new
Heartbeat RPC call.
Reviewers: buda, teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1604
Summary:
This change should allow for multiple `add_lcp` functions to exist in a
single CMakeLists.txt, each adding LCP files to a different target.
Supporting this feature allows us to separate LCP files for single node
and distributed.
Reviewers: mferencevic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1613
Summary: This diff undoes the changes made in D925
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1614
Summary:
Even though we may not need the results of a RPC, we should check that
it completed without error.
Reviewers: mferencevic, mtomic
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1610
Summary:
Knowledge of distributed operators is now removed from PlanPrinter
class. The DistributedOperatorVisitor and PlanPrinter are modified so
that they may be multiple inherited. This is done using virtual
inheritance of HierarchicalLogicalOperatorVisitor. Multiple inheritance
is used to derived a DistributedPlanPrinter which knows how to print
distributed in operators. This removes the dependency of single node
pretty printing on distributed.
Reviewers: msantl, mtomic
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1606
Summary:
With this diff we should be able to support dynamic worker addition.
This is ofcourse a minimal effort maximal impact approach.
This diff introduces new RPC calls when a worker registers.
The `DynamicWorkerAddition` doesn't use `GraphDbAccessor` to get indices because
they create WAL entries.
Reviewers: vkasljevic, ipaljak, buda
Reviewed By: ipaljak
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1594
Summary:
Previous version of code was wrong in some cases. Example:
There's an edge e = (a)->(b). Node (a) belongs to worker 1, node (b) belongs to worker 0. Therefore, edge e belongs to worker 1.
When edge e was serialized on worker 0, address of node b was local and it would be globalized, but the worker id of edge (e) would be used, which is wrong.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1599
Summary:
In a bunch of places `TypedValue` was used where `PropertyValue` should be. A lot of times it was only because `TypedValue` serialization code could be reused for `PropertyValue`, only without providing callbacks for `VERTEX`, `EDGE` and `PATH`. So first I wrote separate serialization code for `PropertyValue` and put it into storage folder. Then I fixed all the places where `TypedValue` was incorrectly used instead of `PropertyValue`. I also disabled implicit `TypedValue` to `PropertyValue` conversion in hopes of preventing misuse in the future.
After that, I wrote code for `VertexAccessor` and `EdgeAccessor` serialization and put it into `storage` folder because it was almost duplicated in distributed BFS and pull produce RPC messages. On the sender side, some subset of records (old or new or both) is serialized, and on the reciever side, records are deserialized and immediately put into transaction cache.
Then I rewrote the `TypedValue` serialization functions (`SaveCapnpTypedValue` and `LoadCapnpTypedValue`) to not take callbacks for `VERTEX`, `EDGE` and `PATH`, but use accessor serialization functions instead. That means that any code that wants to use `TypedValue` serialization must hold a reference to `GraphDbAccessor` and `DataManager`, so that should make clients reconsider if they really want to use `TypedValue` instead of `PropertyValue`.
Reviewers: teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1598
Summary:
This change introduces a pure virtual initial implementation of the transaction
engine which is then implemented in two versions: single node and distributed.
The interface classes now have the following hierarchy:
```
Engine (pure interface)
|
+----+---------- EngineDistributed (common logic)
| |
EngineSingleNode +-------+--------+
| |
EngineMaster EngineWorker
```
In addition to this layout the `EngineMaster` uses `EngineSingleNode` as its
underlying storage engine and only changes the necessary functions to make
them work with the `EngineWorker`.
After this change I recommend that you delete the following leftover files:
```
rm src/distributed/transactional_cache_cleaner_rpc_messages.*
rm src/transactions/common.*
rm src/transactions/engine_rpc_messages.*
```
Reviewers: teon.banek, msantl, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1589
Summary:
This change improves detection of errorneous situations when starting a
distributed cluster on a single machine. It asserts that the user hasn't
started more memgraph nodes on the same machine with the same durability
directory. Also, this diff improves worker registration. Now workers don't have
to have explicitly set IP addresses. The master will deduce them from the
connecting IP when the worker registers.
Reviewers: teon.banek, buda, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1582
Summary:
This is a simple change which modifies interface of
awesome_memgraph_functions to accept C-style pointer to array with
count. Doing things this way, allows us to easily try out different
allocation schemes for function arguments. In this diff, we are now
using stack allocation of arguments in a plain fixed size array. This is
done when the number of arguments is small. According to heaptrack, this
small change should yield noticeable improvements to heap usage.
Obviously, this doesn't solve the problem of heap allocations inside
TypedValue arguments themselves. These allocations appear when
std::string and std::vector is used inside TypedValue.
Micro benchmarks show that there is some performance improvement,
mostly around the limits of using array vs std::vector. The improvement is
more noticeable with multiple threads, due to primary gain being in avoiding
calls to memory allocation.
Reviewers: mtomic, msantl, mferencevic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1581
Summary:
These functions were defined in multiple places. They are moved to
cmake/functions.cmake to keep only one source of truth.
Reviewers: mferencevic, msantl, mculinovic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1578
Summary: Recovery file is version inconsistent iff it starts with a correct magic number, has at least one integer written after that and that integer differs from kVersion.
Reviewers: mferencevic, ipaljak, vkasljevic, buda
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1565
Summary:
In order to add kafka benchmark, `memgraph_bolt.cpp` has been split.
Now we have `memgraph_init.cpp/hpp` files with common memgraph startup code.
Kafka benchmark implements a new `main` function that doesn't start a bolt
server, it just creates and starts a stream. Then it waits for the stream to
start consuming and measures the time it took to import the given number of
entries.
This benchmark is in a new folder, `feature_benchmark`, and so should any new
bechmark that measures performance of memgraphs features.
Reviewers: mferencevic, teon.banek, ipaljak, vkasljevic
Reviewed By: mferencevic, teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1552
Summary:
This diff introduces a new flags
* `--synchronous-commit`
The `--synchronous-commit` tells the WAL when should the deltas be flushed to
the disk drive. By default this is off and the WAL flushes deltas every `N`
milliseconds. If it's turned on, on every transaction end, commit or abort, the
WAL will first flush the deltas and only after that will return from ending a
transaction.
Reviewers: buda, vkasljevic, mferencevic, teon.banek, ipaljak
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1542
Summary: A quick clean-up of user visible error messages. Tried to make them gramatically correct by capitalizing the first word in the sentence and putting a dot at the end.
Reviewers: teon.banek, buda, ipaljak
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1571
Summary:
Changed GRANT ROLE to SET ROLE. Now it is `SET ROLE FOR user TO ROLE` instead of `GRANT ROLE role TO user`. It makes more sense because our users can only have 1 role.
Changed REVOKE ROLE to CLEAR ROLE. Now it is `CLEAR ROLE FOR user` instead of `REVOKE ROLE role FOR user`. REVOKE ROLE would throw exception if user was not a member of role. CLEAR ROLE clears the role whatever it is. I find that the latter makes more sense combined with SET ROLE.
Changed `SHOW ROLE FOR USER user` to `SHOW ROLE FOR user`.
Changed `SHOW USERS FOR ROLE role` to `SHOW USERS FOR role`.
Reviewers: mferencevic, teon.banek, buda
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1572
Summary:
Visitor pattern's main issue is cyclical dependency between classes that
are visited and the visitor instance itself. We need to decouple this
dependency if we want to open source part of the code, namely
non-distributed part. This decoupling is achieved through the use of
`dynamic_cast` in distributed operators. Hopefully the solution is good
enough and doesn't cause performance issues. An alternative solution is
to build our own custom double dispatch solution, but that will
basically boil down to our implementation of runtime type information
and casts.
Note, this only decouples the distributed operators. If and when we
decide that other operators shouldn't be open sourced, the same
`dynamic_cast` pattern should be applied in them also.
Depends on D1563
Reviewers: mtomic, msantl, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1566
Summary:
This is the first step in separating the implementation of distributed
features in operators. Following steps are:
* decoupling distributed visitors
* injecting distributed details in operator state
* minor cleanup or anything else that was overlooked
Reviewers: mtomic, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1563
Summary:
Instead of DataManager returning Cache which can fetch data when needed,
I refactored the code so that the Cache is simple wrapper around
unordered_map and the DataManager is one that is fetching data. Also Cache
is not visible from outside of the DataManager so we can add LRU policy
without changing anything else.
Reviewers: msantl, ipaljak, teon.banek, buda
Reviewed By: msantl, teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1545
Summary:
Added WAL and Snapshot explorer utility executables that read a wal or a
snapshot file and print the content of it, but the main purpose is that they
check it.
This is useful when debugging durability and want to know where it gets stuck.
Reviewers: dgleich, buda
Reviewed By: buda
Subscribers: ipaljak, vkasljevic, teon.banek, pullbot
Differential Revision: https://phabricator.memgraph.io/D1422
Summary:
Updated the feature specs, the changelog and added a new section in
user technical.
Reviewers: mferencevic, mculinovic, buda, ipaljak
Reviewed By: ipaljak
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1534
Summary:
The Kafka Python transform functionality uses a Python script to transform
incoming Kafka data into queries and parameters that are executed against the
database. When starting the Python transform script it is started in a
sandboxed environment so that it can't do harm to the host system or the
database.
Reviewers: msantl, teon.banek
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1509
Summary:
Since RecordAccessor is often instantiated and copied, using a
factory-like function to allocate concrete types on the heap would be
too costly. The approach in this diff uses a Strategy pattern (see
"Design Patterns" by Gamma et al.), where the Strategy interface is
given as RecordAccessor::Impl. Concrete implementations are then created
for each GraphDb. This allows us to instantiate the concrete
RecordAccessors::Impl *once* and *share* it among all RecordAccessors.
Reviewers: msantl, vkasljevic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1510
Summary:
Our implementation of the Bolt protocol now correctly handles INIT message
metadata used for authentification. The related description from the Bolt
standard is: 'The token must contain either just the entry {"scheme": "none"}
or the keys scheme, principal, and credentials. Example {"scheme": "basic",
"principal": "user", "credentials": "secret"}". If no scheme is provided, it
defaults to "none".'
Reviewers: msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1518
Summary:
Since we switched to Cap'n Proto serialization there's no need for
keeping boost around anymore.
Reviewers: mtomic, mferencevic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1515
Summary:
The raft implementation has been stale for a while now. It doesn't
compile, nor uses Cap'n Proto for serialization. In the future we would
probably rewrite it, so it doesn't need to be part of the repo at this
moment.
Reviewers: mferencevic, mtomic, buda
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1513
Summary: first step for a cleaner query front end that might be easier to open source
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1511
Summary:
GraphDbAccessor is now constructed only through GraphDb. This allows the
concrete GraphDb to instantiate a concrete GraphDbAccessor. This allows
us to use virtual calls, so that the implementation may be kept
separate. The major downside of doing things this way is heap allocation
of GraphDbAccessor. In case it turns out to be a real performance
issues, another solution with pointer to static implementation may be
used.
InsertVertexIntoRemote is now a non-member function, which reduces
coupling. It made no sense for it to be member function because it used
only the public parts of GraphDbAccessor.
Reviewers: msantl, mtomic, mferencevic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1504
Summary:
This change, hopefully, simplifies the implementation of different kinds
of GraphDb. The pimpl idiom is now simplified by removing all of the
crazy inheritance. Implementations classes are just plain data stores,
without any methods. The interface classes now have a more flat
hierarchy:
```
GraphDb (pure interface)
|
+----+---------- DistributedGraphDb (pure interface)
| |
Single Node +-----+------+
| |
Master Worker
```
DistributedGraphDb is used as an intermediate interface for all the
things that should work only in distributed. Therefore, virtual calls
for distributed stuff have been removed from GraphDb. Some are exposed
via DistributedGraphDb, other's are only in concrete Master and Worker
classes. The code which relied on those virtual calls has been
refactored to either use DistributedGraphDb, take a pointer to what is
actually needed or use dynamic_cast. Obviously, dynamic_cast is a
temporary solution and should be replaced with another mechanism (e.g.
virtual call, or some other function pointer style).
The cost of the above change is some code duplication in constructors
and destructors of classes. This duplication has a lot of little tweaks
that make it hard to generalize, not to mention that virtual calls do
not work in constructor and destructor. If we really care about
generalizing this, we should think about abandoning RAII in favor of
constructor + Init method.
The next steps for splitting the dependencies that seem logical are:
1) Split GraphDbAccessor implementation, either via inheritance or
passing in an implementation pointer. GraphDbAccessor should then
only be created by a virtual call on GraphDb.
2) Split Interpreter implementation. Besides allowing single node
interpreter to exist without depending on distributed, this will
enable the planner and operators to be correctly separated.
Reviewers: msantl, mferencevic, ipaljak
Reviewed By: msantl
Subscribers: dgleich, pullbot
Differential Revision: https://phabricator.memgraph.io/D1493
Summary: This is a simple thing that enables us to use keywords as symbolic names. A bad thing is that planning will be done for queries which differ only in keyword case (e.g. "MATCH (x) RETURN x" and "match (x) return x"), but that should not be a significant performance impact as queries are done programmaticaly when high throughput is expected and we don't expect letter case change in that case.
Reviewers: buda, mferencevic, teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1496
Summary:
Run each query from a batch in a separate transaction. This will allow us to
abort a transaction if the interpreter throws on a query.
We should keep that in mind in the future for possible speedups.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1501
Summary:
First iteration in implementing kafka.
Currently, memgraph streams won't use the transform script provided in the
`CREATE STREAM` query.
There is a manual test that serves a POC purpose which we'll use to fully wire
kafka in memgraph.
Since streams need to download the script, I moved curl init from
telemetry.
Reviewers: teon.banek, mferencevic
Reviewed By: mferencevic
Subscribers: ipaljak, pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1491
Summary:
This change should preclude the need to specify `:capnp-save` and
`:capnp-load` functions for regularly saved elements of `std::vector<T>`
and `std::optional<T>`. Regular saving in this context means saving
primitive types or compound types which have a `Save(capnp::Builder *)`
method.
Reviewers: mtomic, msantl
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1497
Summary:
Previously, our implementation of the Bolt protocol buffered all results in
memory before sending them out to the client. This implementation immediately
streams the results to the client to avoid any memory allocations. Also, this
implementation splits the interpretation and pulling logic into two.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1495
Summary:
This change should completely support planning Optional for distributed
execution. Cartesian matching is handled, as well as dependencies
between optional branch and the main input branch.
Unit tests are expanded to cover the planning algorithm.
Reviewers: msantl, mtomic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1480
Summary:
During the creation of indexes there could be a case in which a vertex contains a label/property but is not a part of index after
index building completes.
This happens if vertices are being inserted while the index is being built.
Reviewers: buda, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1484
Summary:
Session specifics have been move out of the Bolt `executing` state, and
are accessed via pure virtual Session type. Our server is templated on
the session and we are setting the concrete type, so there should be no
virtual call overhead. Abstract Session is used to indicate the
interface, this could have also been templated, but the explicit
interface definition makes it clearer.
Specific session implementation for running Memgraph is now implemented
in memgraph_bolt, which instantiates the concrete session type. This may
not be 100% appropriate place, but Memgraph specific session isn't
needed anywhere else.
Bolt/communication tests now use a dummy session and depend only on
communication, which significantly improves test run times.
All these changes make the communication a library which doesn't depend
on storage nor the database. Only shared connection points, which aren't
part of the base communication library are:
* glue/conversion -- which converts between storage and bolt types, and
* communication/result_stream_faker -- templated, but used in tests and query/repl
Depends on D1453
Reviewers: mferencevic, buda, mtomic, msantl
Reviewed By: mferencevic, mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1456
Summary:
This is the first step in cutting the crazy dependencies of
communication module to the whole database. Includes have been
reorganized and conversion between DecodedValue and other Memgraph types
(TypedValue and PropertyValue) has been extracted to a higher level
component called `communication/conversion`. Encoder, like Decoder, now
relies only on DecodedValue. Hopefully the conversion operations will
not significantly slow down streaming Bolt data.
Additionally, Bolt ID is now wrapped in a class. Our storage model uses
*unsigned* int64, while Bolt expects *signed* int64. The implicit
conversions may lead to encode/decode errors, so the wrapper should
enforce some type safety to prevent such errors.
Reviewers: mferencevic, buda, msantl, mtomic
Reviewed By: mferencevic, mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1453
Summary:
Added test stream functionality. When a stream is configured, it will try to
consume messages from a kafka topic and return them back to the user.
For now, the messages aren't transformed, so it just returns the payload string.
Depends on D1466
Next steps are persisting stream metadata and transforming messages in order to
store them in the graph.
Reviewers: teon.banek, mtomic
Reviewed By: teon.banek
Subscribers: pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1474
Summary:
Integrated kafka library into memgraph. This version supports all opencypher
features and will only output messages consumed from kafka.
Depends on D1434
Next steps are persisting stream metadata and transforming messages in order to
store them in the graph.
Reviewers: teon.banek, mtomic, mferencevic, buda
Reviewed By: teon.banek
Subscribers: mferencevic, pullbot, buda
Differential Revision: https://phabricator.memgraph.io/D1466
Summary: When doing bfs with given endpoint, we can stop the traversal on first successful pull.
Reviewers: teon.banek, msantl, mculinovic, buda
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1469
Summary:
Added basic functionality for kafka streams. The `CREATE STREAM` clause is a
simplified version from the one mentioned in D1415 so we can start testing
end-to-end sooner.
This diff also includes a bug fix in `lcp.list ` for operators that have no
members.
Reviewers: teon.banek, mtomic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1434
Summary:
Wal on workers didn't contain committed transactions ids, this is needed for
distributed recovery so that the master may decide which transactions are
present on all the workers.
Reviewers: buda, msantl
Reviewed By: buda
Subscribers: pullbot, msantl, buda
Differential Revision: https://phabricator.memgraph.io/D1440
Summary:
For reasons unknown to man, antlr didn't want to parse calls
to functions whose name is a single hex letter properly. Now it should work.
Reviewers: teon.banek, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1464
Summary:
If `capnp-id` isn't passed to LCP, then no C++ code for serialization
will be generated. Previously, only schema wasn't generated.
An error with serializing a derived class whose parent isn't serialized
is now reported with a suggestion on how to fix this.
Reviewers: mtomic, buda, msantl
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1460
Summary:
Added missing string functions.
I also cleaned up error messages a bit in effort to make them uniform.
Reviewers: teon.banek, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1458
Summary:
This diff implements OpenSSL support in the network stack.
Currently SSL support is only enabled for Bolt connections,
support for RPC connections will be added in another diff.
Reviewers: buda, teon.banek
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1328
Summary:
Add telemetry to main memgraph binary
Add resource usage collector
Add telemetry flag
Change telemetry collector logic
Fix utils compilation
Add timestamp
Add first version of interactive test
Started working on test runner
Implement all tests
Flake8 on runner.py
Integrate test with Apollo
Add TODO
Reviewers: buda, teon.banek
Reviewed By: buda, teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1419
Summary:
Hopefully, the mechanism of generating Cartesian is general enough, so
this simple change should work correctly in all cases.
Planner tests have been modified to use a FakeDbAccessor in order to
speed them up and potentially allow extracting planning into a library.
Reviewers: msantl, mtomic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1431
Summary:
This change should correctly plan Cartesian which have dependent Filter
or Expand operators. Tests have been added for those cases. Other cases
are not yet supported and should throw an exception.
Reviewers: msantl, mtomic, buda
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1426
Summary:
This is the initial step to getting a correct version of distributed
planning of Cartesian operator. Functions and structs have been added
which should collect enough information to correctly order the execution
with regards to dependencies among Cartesian branches. The support
functionality should be the same as was before, but unsupported cases
should now raise an exception instead of leading to undefined behaviour.
Reviewers: msantl, mtomic, buda
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1418
Summary:
Converts the RPC stack to use Cap'n Proto for serialization instead of
boost. There are still some traces of boost in other places in the code,
but most of it is removed. A future diff should cleanup boost for good.
The RPC API is now changed to be more flexible with regards to how
serialize data. This makes the simplest cases a bit more verbose, but
allows complex serialization code to be correctly written instead of
relying on hacks. (For reference, look for the old serialization of
`PullRpc` which had a nasty pointer hacks to inject accessors in
`TypedValue`.)
Since RPC messages were uselessly modeled via inheritance of Message
base class, that class is now removed. Furthermore, that approach
doesn't really work with Cap'n Proto. Instead, each message type is
required to have some type information. This can be automated, so
`define-rpc` has been added to LCP, which hopefully simplifies defining
new RPC request and response messages.
Specify Cap'n Proto schema ID in cmake
This preserves Cap'n Proto generated typeIds across multiple generations
of capnp schemas through LCP. It is imperative that typeId stays the
same to ensure that different compilations of Memgraph may communicate
via RPC in a distributed cluster.
Use CLOS for meta information on C++ types in LCP
Since some structure slots and functions have started to repeat
themselves, it makes sense to model C++ meta information via Common Lisp
Object System.
Depends on D1391
Reviewers: buda, dgleich, mferencevic, mtomic, mculinovic, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1407
Summary:
Add additional structs and functions for handling C++ meta information.
Add capnp-file and capnp-id arguments to lcp:process-file.
Generate cpp along with hpp and capnp in lcp.
Wrap LogicalOperator base class in lcp:define-class.
Modify logical operators for capnp serialization.
Add query/common.capnp.
Reviewers: mculinovic, buda, mtomic, msantl, ipaljak, dgleich, mferencevic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1391
Summary:
Utils source files are now moved to a standalone mg-utils library.
Unit and manual tests are no longer collected using glob recursion in
cmake, but are explicitly listed. This allows us to set only required
dependencies of those tests.
Both of these changes should improve compilation and link times, as well
as lower the disk usage.
Additional improvement would be to cleanup utils header files to be
split in .hpp and .cpp as well as merging threading into utils. Other
potential library extractions that shouldn't be difficult are:
* data_structures
* io/network
* communication
Reviewers: buda, mferencevic, dgleich, ipaljak, mculinovic, mtomic, msantl
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1408
Summary:
It seems that streamoff is not in std::iostream namespace, but in std::
namespace. This caused problems with newer versions of libraries.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1404
Summary:
Command id is necessary in remote produce to identify an ongoing pull
because a transaction can have multiple commands that all belong under
the same plan and tx id.
Reviewers: teon.banek, mtomic, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1386
Summary:
Removing `AS_IS` from GraphView because it doesn't seem like it is necessary for query execution and it also has weird semantics (you might get a mix of old and new records). `Unwind`, `OrderBy` and `PullRemoteOrderBy` now use `OLD` graph view.
Remove AS_IS from GraphView
Fix query_cost_estimator tests
Fix query_expression_evaluator tests
Fix query_plan_match_filter_return tests
Fix query_plan_create_set_remove_delete tests
Fix query_plan_accumulate_aggregate tests
Reviewers: teon.banek, buda
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1390
Summary:
A simplified end-to-end implementation
POD interface set-up, still have bugs with HDDkeys
Version bug fix and first iterator implementation
Fixed out-of-scope reference in PVS iterator
Added PVS unit tests
Reviewers: buda, mferencevic, dgleich, teon.banek
Reviewed By: buda, dgleich
Subscribers: mferencevic, pullbot
Differential Revision: https://phabricator.memgraph.io/D1369
Summary:
Since we are moving from boost to Capnp for serialization, it makes
sense to keep all of the LogicalOperator classes in LCP format. This
will make it easier to generate Capnp code.
Depends on D1361
Reviewers: buda, mferencevic, msantl, dgleich, ipaljak, mculinovic, mtomic
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1362
Summary: Global address of `from` was stored in Edge record when created via CreateEdge RPC.
Reviewers: buda, msantl, dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1387
Summary:
Pointers in AstTreeStorage can point to same nodes. These nodes
should be serialized(saved) only once when serializing Ast tree.
Same goes for deserializing(loading).
When deserializing nodes, one should not use storage Create method,
because it will add node which isn't completely loaded to storage vector.
Therefore, one should use new Deserialize method which doesn't
add node to storage vector. Inserting node into storage vector
is defered until all node data is loaded.
Pass pointer to saved uids set, instead of storage
Delete unused set
Reviewers: teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1344
Summary:
In order to enhance C++ metaprogramming capabilities, a custom
preprocessing step is added before compilation. C++ code may be mixed
with Lisp code in order to generate a complete C++ source code. The
mechanism is hooked into cmake. To notify cmake of .lcp files, `add_lcp`
function in src/CMakeLists.txt needs to be invoked.
The main executable entry point is in tools/lcp, while the source code
is in src/lisp/lcp.lisp
The main goal of LCP is to auto generate class serialization code and
member variable getter functions. This should now be significantly less
error prone, since you cannot forget to serialize a member variable
through this mechanism. Future uses should be generating other repeating
code, such as `Clone` methods or perhaps some debug information.
.lcp files may contain mixed C++ code (enclosed in #>cpp ... cpp<#
blocks) with Common Lisp code.
NOTE: With great power comes great responsibility. Lisp metaprogramming
capabilities are incredibly powerful. To keep the sanity of the team
intact, use Lisp preprocessing only when *really* necessary.
Reviewers: buda, mferencevic, msantl, dgleich, ipaljak, mculinovic, mtomic
Reviewed By: mtomic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1361
Summary:
GC called clog info on transactions which were not completed even though it
could deduce that they are not finished. By avoiding calling clog info on them
we can lower the number of rpc calls and remove a strange behaviour of gc
appearing to be hanging while recovering a worker.
Reviewers: mculinovic, mferencevic, buda
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1377
Summary:
This binary is installed and packaged for release. This is just a quick
solution for releasing the Community 0.10 version. We still need to
setup the installation and packaging for both the Enterprise and
Community versions. Additionally, the automated build system needs to
test both binaries for correct behaviour. Obviously, some tests can only
be run on one of the 2 versions.
Reviewers: mferencevic, buda
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1363
Summary:
Added a new markdown file for concepts and a new test to test the edge
case with upper bound.
Reviewers: teon.banek, mtomic, dgleich, buda, ipaljak
Reviewed By: teon.banek, dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1366
Summary:
Wal retention test was flaky, this kinda fixes it.
It's hard to fix it completely since there are separate threads happening which
are not synchronized, and forcing the synchronization from the outside would
not agree to the specifications i.e. single-threadness of WAL.
Reviewers: buda
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1358
Summary: Locks should be released as early as possible
Reviewers: msantl, florijan
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1357
Summary:
Use /var/lib/memgraph as home dir for memgraph user.
Merge post and pre RPM scripts in RPM spec file.
Properly handle memgraph.service and directory permissions.
Reviewers: mferencevic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1350
Summary:
- Removed a lot of stuff that was incorrect and/or unnecessary
- Fixed const-correctness in the skiplist family
Reviewers: dgleich, teon.banek, buda
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1351
Summary: Do it so properties-on-disk can be implemented.
Reviewers: buda, dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1343
Summary:
If someone was to call `EnsureDir` concurrently it might fail because
the `fs::create_directories` returns false when the directory already exists
with the `error_code` value being `0`.
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1349
Summary:
After the commit log was cleared there was some transaction that tried to acquire
a lock on an object that was taken by a transaction that was not longer active on
the worker. Inquring the commit log about that transaction causes a crash since
the commit log is cleared of that transaction.
Solution is to clear the transaction cache before clearing the commit log, which
forces the transactions to release their locks and as such their ids will never be
queried through the commit log in the future.
Reviewers: florijan, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1342
Summary:
Implemented cluster discovery in distributed memgraph.
When a worker registers, it sends a RPC request to master.
The master assigns that worker an id and sends the information about other
workers (pairs of <worker_id, endpoint>) to the new worker.
Master also sends the information about the new worker to all existing workers
in the process of worker registration.
After the last worker registers, all memgraph instances in the clusters should
know about every other.
Reviewers: mtomic, buda, florijan
Reviewed By: mtomic
Subscribers: teon.banek, dgleich, pullbot
Differential Revision: https://phabricator.memgraph.io/D1339
Summary: Make synchronized snapshot. This invokese the snapshooter on workers on the master snapshot scheduler interval.
Reviewers: msantl, mtomic
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1334
Summary:
Snapshots should have the transaction from which they were created because we need this info for recovery later on.
Otherwise we wouldn't be able to tell the workers from which snapshots to recover. The whole cluster should be recovered
from the same transaction snapshot.
Reviewers: msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1338
Summary: Adds a commit log garbage collector, which clears old transactions from the commit log
Reviewers: florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1310
Summary:
When commiting/aborting a transaction in tx master engine, make a two
phase commit to all workers so they can stop all futures and clear
transactional cache.
Reviewers: dgleich, florijan
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1320
Summary:
Snapshot scheduler object was released from unique ptr and not actually freed, which
caused the snapshooter to access the tx_engine after it was destructed.
Reviewers: mferencevic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1325
Summary:
Master shouldn't stop processing rpc calls immediately on shutdown. It
should wait for all workers to stop, and then destroy itself.
Reviewers: dgleich, mferencevic
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1330
Summary:
Remove semicolon.
Semicolon shouldn't be used within macros and should
be explicitly provided by the user of the said macro.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1321
Summary:
The network layer now has a `Session` that handles all things that should be
done before the `Execute` method is called on sessions. Also, all sessions
now communicate using streams instead of holding the input buffer and writing
to the `Socket`. This design will allow implementation of a SSL middleware.
Reviewers: buda, dgleich
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1314
Summary:
Before we used `utils::Future` only where it's created by our `ThreadPool`.
I suggest in this diff that we use it everywhere, it's a bit more defensive and
should not have any downsides.
Reviewers: msantl, teon.banek
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1316
Summary:
Add different priority VLOGs for distributed memgraph.
For level 3 you'll get logs for dispatching/consuming plans.
For level 4 you'll get logs for tx start/commit/abort, remote produce, remote
pull, remote result consume,
For level 5 there will be a log for each request/response made by the RPC
client.
Master log snippet P9
Worker log snippet P10
Reviewers: florijan, teon.banek
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1296
Summary:
Remove "produce_" and "Produce" as prefix from all distributed stuff.
It's not removed in src/query/ stuff (operators).
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1315
Summary:
When performing recovery, ensure that the transaction ID in engine is
bumped to one after the max tx id seen in recovery.
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1312
Summary:
Find[Vertex|Edge] -> Find[Vertex|Edge]Optional
Find[Vertex|Edge]Checked -> Find[Vertex|Edge]
In some places change old code that finds-optional and immediately checks
to use the checked functionality.
It seems that in all the src/ stuff optional finds are no loger used,
only in tests, but there they are used extensively so I don't feel those
functions should be removed.
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1309
Summary: This is correct, and uses less memory.
Reviewers: teon.banek, buda, mislav.bradac
Reviewed By: buda
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1304
Summary:
Also removed some convenience code that became obsolete.
No logic changes.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1303
Summary:
Writers preference is not guaranteed by the standard and as such
there are variations between implementations in different versions
of libphthread.
Reviewers: dgleich
Reviewed By: dgleich
Differential Revision: https://phabricator.memgraph.io/D1305
Summary:
Ensures that query plans are invalidated on the remote only when it's
guaranteed they will never be used on the master. This is done by
invalidating remote caches in the `CachedPlan` destructor.
There are two unplesant side-effects. First, an RPC call is made in an
object destructor. This is somewhat ugly, but not that different then
making an RPC call that must succeed in any other function. Note that
this does NOT slow down any query execution because the relevant
destructor is called by the skiplist garbage collector. The second ugly
side-effect is that in the unit test now we need to sleep to ensure the
skiplist GC destructs a cached plan before checking that it's
invalidated on the remote worker.
We might want to redesign this at some point.
Reviewers: teon.banek
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1302
Summary:
Recursion in Version destructor was causing a SEGFAULT for
long version lists because of a stack overflow.
Reviewers: florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1301
Summary:
- Remove caches on workers as a result of plan expiration or race
during insertion.
- Extract caching functionality into a class.
- Minor refactor of Interpreter::operator()
- New RPC and test for it.
- Rename ConsumePlanRes to DispatchPlanRes for consistency, remove
return value as it's always true and never used.
- Interpreter is now constructed with a `GraphDb` reference. At the
moment only for reaching the `distributed::PlanDispatcher`, but in
the future we should probably use that primarily for planning.
I added a function to `PlanConsumer` that is only used for testing.
I prefer not doing this, but I felt this needed testing. I can remove
it now if you like.
Reviewers: teon.banek, msantl
Reviewed By: teon.banek
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1292
Summary:
Added a class that is supposed to be used as a base class on classes
where we want to see the stacktrace of destructor call.
Example of the output is P7 where classes `GraphDb` and `Client` extend
`LogDestructor` base.
Reviewers: florijan, teon.banek
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1281
Summary:
This fixes a bug where the planning would raise NotYetImplemented error,
due to preventing plan splitting while the operator is already on
master. For example, `MATCH (a), (b) CREATE (a)-[e:r]->(b) RETURN e`
would split the plan after Cartesian and before Create. Thus, the rest
of the plan would be on master, including the Accumulate before Produce.
We still can (and must) replace Accumulate with Synchronize but this
would fail due to unneeded check.
Reviewers: florijan, msantl
Reviewed By: msantl
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1272
Summary:
I have not thoroughly thought this through, especially the worker
destruction (is it legit to abort all running tx?), but it's tested to
abort during remote pull, what we need.
Also I improved error handling for vertex deletion failure during
remote pull (@dgleich).
Reviewers: teon.banek, msantl, dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1263
Summary:
If the `db` says we need to abort we should abort. There are two
places (in both `PullRemote` and `PullRemoteOrderBy`) where the check is made.
The first one is on the very beginning of the `Pull` method and the second one
is in the loop that checks/waits for remote results.
Reviewers: teon.banek, florijan
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1265
Summary:
Instead of passing `coordination`, pass `rpc_worker_clients` that
holds a map of worker_id->clientPool. By having only one instance of
`RpcWorkerClients` that is owned by `GraphDB` and passing it by refference
we'll share the same client pools for rpc clients.
Reviewers: teon.banek, florijan, dgleich, mferencevic
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1261
Summary:
Usually a single RPC call takes 50us, which is a lot lower than our
millisecond sleep. Calling sleep with less than a ms duration may not
be really supported (due to timer granularity), but this change should
cause some sub-ms sleep.
Reviewers: florijan, msantl, mferencevic
Reviewed By: florijan
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1258
Summary:
There's no need to heap allocate the request which is used only to be
serialized and sent over the network. Unfortunately, this change
complicates the reading a bit, because we need to deserialize the raw
pointer and put it in std::unique_ptr instead of simply deserializing
the unique_ptr.
Heap allocation shows up visibly in perf during benchmarks, so this
change may yield some improvement.
Reviewers: mferencevic, mtomic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1254
Summary: It was possible for transaction to be double freed because both the AllClear and SingleClear of transactions could delete the pointer
Reviewers: florijan, mferencevic
Reviewed By: mferencevic
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1250
Summary:
Release of per-transaction data in distributed Memgraph refactored. The
master node no longer releases each time a transaction is done, thus
offloading some work from the engine.
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1235
Summary:
Remove a method in tx::Engine whose results can be obtained from commit
log info (also guaranteed to be globally correct in distributed).
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1240
Summary:
Difficult to test properly as the problem was an implicit conversion of
`gid::Gid` to `mvcc::VersionList<> *`. The only proper way to defend
against this would be to make `gid::Gid` a type.
The `CHECK` added in this diff does not help, but should be there, so...
Reviewers: dgleich
Reviewed By: dgleich
Subscribers: pullbot
Differential Revision: https://phabricator.memgraph.io/D1244