Commit Graph

114 Commits

Author SHA1 Message Date
Ivan Paljak
deee3b8ab7 Remove Raft's dependency on transaction id
Reviewers: buda, mferencevic

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2338
2019-09-10 18:02:05 +02:00
Teon Banek
1a20c557b8 Update storage API docs with thrown exceptions
Summary:
The documentation includes `std` exceptions like `std::bad_alloc` or
`std::system_error`, for which there's probably nothing we can do. This
may seem unnecessary, but it will be really helpful when writing the C
API for interfacing with custom modules and plugins, as well as when
switching to storage v2 API.

In general, we should start updating the documentation of functions
which may throw exceptions. This ought to be enforced in code review, so
that the implementation and documentation are kept in sync.

Reviewers: mferencevic, mtomic, msantl

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2288
2019-08-07 15:58:38 +02:00
Matej Ferencevic
111dd8bf19 Remove distributed
Reviewers: teon.banek

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2213
2019-07-17 15:23:42 +02:00
Ivan Paljak
4acad5795b Expose the status of transaction within Raft
Summary:
For proper client interaction, we need to expose the (term_id, log_index)
pair for the transaction that's about to be replicated and we need to be able
to retrieve the status of a transaction defined by that pair. Transaction
status can be one of the following:

  1) REPLICATED (self-explanatory)
  2) WAITING (waiting for replication)
  3) ABORTED (self-explanatory)
  4) INVALID (received request with either invalid term_id or invalid log_index)

Reviewers: mferencevic

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2201
2019-07-16 13:53:16 +02:00
Lovro Lugovic
59af45f94e LCP: Fix up LCP warnings
Reviewers: mtomic, teon.banek

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2090
2019-06-12 09:37:47 +02:00
Matija Santl
5d5dfbb6f7 Fix how HA handles leader change during commit
Summary:
During it's leadership, one peer can receive RPC messages from other peers that his reign is over.
The problem is when this happens during a transaction commit.

This is handled in the following way.
If we're the current leader and we want to commit a transaction, we need to make sure the Raft Log is replicated before we can tell the client that the transaction is committed.
During that wait, we can only notice that the replication takes too long, and we report that with `LOG(WARNING)` messages.

If we change the Raft mode during the wait, our Raft implementation will internally commit this transaction, but won't be able to acquire the Raft lock because the `db.Reset` has been called.
This is why there is an manual lock acquire. If we pick up that the `db.Reset` has been called, we throw an `UnexpectedLeaderChangeException` exception to the client.

Another thing with long running transactions, if someone decides to kill a `memgraph_ha` instance during the commit, the transaction will have `abort` hint set. This will cause the `src/query/operator.cpp` to throw a `HintedAbortError`. We need to catch this during the shutdown, because the `memgraph_ha` isn't dead from the user perspective, and the transaction wasn't aborted because it took too long, but we can differentiate between those two.

Reviewers: mferencevic, ipaljak

Reviewed By: mferencevic, ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1956
2019-05-20 16:39:44 +02:00
Matej Ferencevic
5c244c1ad4 Remove Cap'n Proto
Summary:
There will be a lot of leftover files, execute the following commands inside
`src/` to remove them:
```
git clean -xf
rm -r rpc/ storage/single_node_ha/rpc/
```

Reviewers: teon.banek

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2011
2019-05-08 10:51:10 +02:00
Matej Ferencevic
d678e45c10 Migrate RPC to SLK
Summary:
Migrate all RPCs
Simplify Raft InstallSnapshot RPC
Add missing Load and Save for `char`

Reviewers: teon.banek, msantl

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D2001
2019-05-06 14:27:57 +02:00
Matej Ferencevic
129c6c0242 Finish SLK implementation
Reviewers: teon.banek, msantl

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1999
2019-05-02 15:47:38 +02:00
Teon Banek
95f4d1c3fa Generate static Save and Load methods for RPCs
Reviewers: mtomic, mferencevic

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1995
2019-04-29 13:42:17 +02:00
Matej Ferencevic
6182312e3d Remove TX info from HA snapshot
Reviewers: msantl, ipaljak

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1984
2019-04-25 14:00:00 +02:00
Matej Ferencevic
9291a5fc4d Migrate to C++17
Reviewers: teon.banek, buda

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1974
2019-04-23 14:46:44 +02:00
Matija Santl
54b23ba5b6 Add replication timeout in Raft
Summary:
Added a new config parameter, replication timeout. This parameter sets the
upper limit to the replication phase and once the timeout exceeds, the
transaction engine stops accepting new transactions.

We could experience this timeout in two cases:
 1. a network partition
 2. majority of the cluster stops working

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1893
2019-02-27 17:42:35 +01:00
Matija Santl
f85095c203 Fix Raft shutdown
Summary:
During the following scenario:
 - start a HA cluster with 3 machines
 - find the leader and start sending queries
 - SIGTERM the leader but leave other 2 machines untouched

The leader would be stuck in the shutdown phase.

This was happening because during the shutdown phase of the Bolt server, a
`graph_db_accessor` would try to commit a transaction after we've already shut
down Raft server.  Raft, although not running, is still thinking it's in the
Leader mode. Tx Engine calls the `SafeToCommit` method to Commit transactions,
and ends up in an infinite loop.

Since Raft was shut down it won't handle any of the incoming RPCs and won't
change it's mode.

The fix here is to shut down the Bolt server before Raft, so we don't have any
pending commits once Raft is shut down.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1853
2019-02-12 15:12:39 +01:00
Matija Santl
62e06d4b70 Fix re-election in Raft
Summary:
Once a leader loses it's leadership, in order to handle hanging
transactions, we reset the storage and the transaction engine.

This requires to re-apply all the commited entries from the log.

Once we add snapshot (log compaction) we would need to do that also.

One thing to have in mind is the `election_timeout_min` parameter. If it's set
too low it could trigger leader re-election too often.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1822
2019-01-22 14:51:24 +01:00
Matej Ferencevic
3209788cd4 Implement new spin lock
Reviewers: teon.banek, buda

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1786
2019-01-08 09:15:07 +01:00
Matija Santl
8c51d2fa0b Add ReplicationLog to RaftServer
Summary:
* renamed `HasCommitted` to `SafeToCommit`
* implemented (c/p) `ReplicationLog`

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1776
2018-12-18 17:24:02 +01:00
Matija Santl
5e6cf0724a Implement StateDelta apply method for Raft
Summary:
TransactionReplicator replicates transactions on follower machines in
HA memgraph. Our DB accessor API doesn't provide us with the functionality to
begin transactions with non-increasing ids. This is why the
`TransactionReplicator` uses a internal map that maps tx ids from the leader
node to transactions on the follower node (whose id doesn't have to match the
leaders tx id).

If the leader has the following transaction timeline:

```
    L
tx1
 |
 |   tx2
 |    |
 |    |
 |    |
 |    |
 |    |
 |   tx2
 |
 |
 |
 |
tx1
```

`tx2` will commit first and will be replicated. When applying `tx2` on follower
nodes, they will start a new transaction with tx id `1`.  When `tx1` starts
replicating, followers will start a new transaction with tx id `2`. And this is
wehre `TransactionReplicator` kicks in.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1775
2018-12-14 14:26:40 +01:00
Teon Banek
68b2dcc490 Serialize RPC messages using SLK
Reviewers: msantl, mtomic, mferencevic

Reviewed By: mtomic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1771
2018-12-14 14:24:39 +01:00
Matija Santl
f501980973 Wire raft into memgraph pt.1.
Summary:
This is just the first diff that tries to wire the raft protocol into
memgraph.

In this diff I'm introducing transaction engine reset functionality. I also
introduced `RaftInterface` which should be used wherever someone wants to access
Raft from Memgraph.

For design decisions see the feature spec.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak

Subscribers: pullbot, teon.banek

Differential Revision: https://phabricator.memgraph.io/D1758
2018-12-10 17:08:36 +01:00
Teon Banek
7638b09867 Generate SLK serialization from LCP
Summary:
Classes marked with `:serialize (:slk)` will now generate SLK
serialization code. This diff also changes how the `:serialize` option
is parsed, so that multiple different serialization backends are
supported.

Reviewers: mtomic, llugovic, mferencevic

Reviewed By: mtomic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1755
2018-12-05 14:58:39 +01:00
Matija Santl
b647e3f8b8 Prepare memgraph for HA
Summary:
Removed WAL and WAL recovery from single node ha binary.
Added `LogEntryBuffer` in `RaftServer`.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak, teon.banek

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1739
2018-11-22 15:07:39 +01:00
Vinko Kasljevic
7ba8228c46 Refactor storage file structure
Summary:
- Create types folder in storage/common
- Move locking and kvstore to storage/common
- Add storage/distributed/rpc folder

Reviewers: teon.banek, ipaljak, msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1713
2018-11-06 18:17:31 +01:00
Matija Santl
43cb506f06 Prepare codebase for high availability
Summary:
Initial for fork (in form of a c/p) form the current single node
version. Once we finish HA we plan to re-link the files to the single node
versions if they don't change.

Reviewers: ipaljak, buda, mferencevic

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1705
2018-10-30 10:58:50 +01:00
Matija Santl
800db5058e Add blocking transactions for index creation
Summary:
Blocking transaction has the ability to stop the transaction engine from
starting new transactions (regular or blocking) and to wait all other active
transactions to finish (to become non active, committed or aborted). One thing
that blocking transactions support is defining the parent transaction which
does not need to end in order for the blocking one to start. This is because of
a use case where we start nested transactions.

One could thing we should build indexes inside those blocking transactions. This
is true and I wanted to implement this, but this would require some digging in
the interpreter which I didn't want to do in this change.

Reviewers: mferencevic, vkasljevic, teon.banek

Reviewed By: mferencevic, teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1695
2018-10-24 16:31:50 +02:00
Matej Ferencevic
baae40fcc6 Move RPC server to Coordination
Reviewers: teon.banek, msantl

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1658
2018-10-16 09:12:37 +02:00
Matej Ferencevic
4e5fe37dd6 Remove virtual and pimpl from single node
Summary:
This diff removes: `SingleNodeRecoveryTransactions`, `TypemapPack`
It also removes virtual and/or pimpl from: `SingleNodeCounters`,
`StorageGcSingleNode`, `SingleNodeConcurrentIdMapper`,
accessors (revert D1510), transaction engine, `GraphDbAccessor`, `GraphDb`

Reviewers: msantl, teon.banek

Reviewed By: msantl, teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1639
2018-10-09 11:48:30 +02:00
Matej Ferencevic
5097c10ba8 Separate distributed from single node transaction engine
Reviewers: teon.banek, msantl

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1637
2018-10-05 14:11:39 +02:00
Matej Ferencevic
75950664a7 Separate distributed from single node storage
Summary:
This diff splits single node and distributed storage from each other.
Currently all of the storage code is copied into two directories (one single
node, one distributed).  The logic used in the storage implementation isn't
touched, it will be refactored in following diffs.

To clean the working directory after this diff you should execute:
```
rm database/state_delta.capnp
rm database/state_delta.hpp
rm storage/concurrent_id_mapper_rpc_messages.capnp
rm storage/concurrent_id_mapper_rpc_messages.hpp
```

Reviewers: teon.banek, buda, msantl

Reviewed By: teon.banek, msantl

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1625
2018-10-05 09:19:33 +02:00
Teon Banek
1fd9a72e10 Generate Load functions from LCP as top level
Summary: Depends on D1596

Reviewers: mtomic, msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1601
2018-09-28 10:34:20 +02:00
Teon Banek
a5926b4e0f Generate Save functions from LCP as top level
Summary:
This should allow us to more easily decouple the code which should be
open sourced. Unfortunately, the downside of this approach is that we
cannot rely on virtual calls to dispatch the serialization to correct
type. Another downside is that members need to be publicly accessible
for serialization.

Reviewers: mtomic, msantl

Reviewed By: mtomic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1596
2018-09-28 10:26:56 +02:00
Matej Ferencevic
53c405c699 Throw exceptions on RPC failure and Distributed error handling
Summary:
This diff changes the RPC layer to directly return `TResponse` to the user when
issuing a `Call<...>` RPC call. The call throws an exception on failure
(instead of the previous return `nullopt`).

All servers (network, RPC and distributed) are set to have explicit `Shutdown`
methods so that a controlled shutdown can always be performed. The object
destructors now have `CHECK`s to enforce that the `AwaitShutdown` methods were
called.

The distributed memgraph is changed that none of the binaries (master/workers)
crash when there is a communication failure. Instead, the whole cluster starts
a graceful shutdown when a persistent communication error is detected.
Transient errors are allowed during execution. The transaction that errored out
will be aborted on the whole cluster. The cluster state is managed using a new
Heartbeat RPC call.

Reviewers: buda, teon.banek, msantl

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1604
2018-09-27 16:27:40 +02:00
Matej Ferencevic
96ece11cdd Move distributed transaction engine logic
Summary:
This change introduces a pure virtual initial implementation of the transaction
engine which is then implemented in two versions: single node and distributed.
The interface classes now have the following hierarchy:

```
    Engine (pure interface)
         |
    +----+---------- EngineDistributed (common logic)
    |                         |
EngineSingleNode      +-------+--------+
                      |                |
                 EngineMaster     EngineWorker
```

In addition to this layout the `EngineMaster` uses `EngineSingleNode` as its
underlying storage engine and only changes the necessary functions to make
them work with the `EngineWorker`.

After this change I recommend that you delete the following leftover files:
```
rm src/distributed/transactional_cache_cleaner_rpc_messages.*
rm src/transactions/common.*
rm src/transactions/engine_rpc_messages.*
```

Reviewers: teon.banek, msantl, buda

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1589
2018-09-07 11:43:57 +02:00
Dominik Gleich
d9f25cc668 Write committed/aborted op to wal
Summary:
Wal on workers didn't contain committed transactions ids, this is needed for
distributed recovery so that the master may decide which transactions are
present on all the workers.

Reviewers: buda, msantl

Reviewed By: buda

Subscribers: pullbot, msantl, buda

Differential Revision: https://phabricator.memgraph.io/D1440
2018-07-05 12:43:18 +02:00
Teon Banek
e0474a8e92 Replace boost with capnp in RPC
Summary:
Converts the RPC stack to use Cap'n Proto for serialization instead of
boost. There are still some traces of boost in other places in the code,
but most of it is removed. A future diff should cleanup boost for good.

The RPC API is now changed to be more flexible with regards to how
serialize data. This makes the simplest cases a bit more verbose, but
allows complex serialization code to be correctly written instead of
relying on hacks. (For reference, look for the old serialization of
`PullRpc` which had a nasty pointer hacks to inject accessors in
`TypedValue`.)

Since RPC messages were uselessly modeled via inheritance of Message
base class, that class is now removed. Furthermore, that approach
doesn't really work with Cap'n Proto. Instead, each message type is
required to have some type information. This can be automated, so
`define-rpc` has been added to LCP, which hopefully simplifies defining
new RPC request and response messages.

Specify Cap'n Proto schema ID in cmake

This preserves Cap'n Proto generated typeIds across multiple generations
of capnp schemas through LCP. It is imperative that typeId stays the
same to ensure that different compilations of Memgraph may communicate
via RPC in a distributed cluster.

Use CLOS for meta information on C++ types in LCP

Since some structure slots and functions have started to repeat
themselves, it makes sense to model C++ meta information via Common Lisp
Object System.

Depends on D1391

Reviewers: buda, dgleich, mferencevic, mtomic, mculinovic, msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1407
2018-06-04 10:45:12 +02:00
Teon Banek
c7b6cae526 Extract io/network into mg-io library
Reviewers: buda, dgleich, mferencevic

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1411
2018-05-30 14:58:41 +02:00
Matija Santl
f872c93ad1 Add command id to remote produce
Summary:
Command id is necessary in remote produce to identify an ongoing pull
because a transaction can have multiple commands that all belong under
the same plan and tx id.

Reviewers: teon.banek, mtomic, buda

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1386
2018-05-16 10:20:39 +02:00
Dominik Gleich
7af80ebb8d Replace command_id_t with CommandId and transaction_id_t with TransactionId.
Reviewers: buda

Reviewed By: buda

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1367
2018-04-20 13:55:14 +02:00
Dominik Gleich
eb30ecb6a0 Commit log gc
Summary: Adds a commit log garbage collector, which clears old transactions from the commit log

Reviewers: florijan

Reviewed By: florijan

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1310
2018-04-04 10:25:25 +02:00
Matija Santl
0bcf2edeae Two phase commit on cursor destruction
Summary:
When commiting/aborting a transaction in tx master engine, make a two
phase commit to all workers so they can stop all futures and clear
transactional cache.

Reviewers: dgleich, florijan

Reviewed By: dgleich

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1320
2018-04-03 16:20:00 +02:00
Marin Tomic
86072a993a Fix logging in worker_engine
Summary: log before deleting tx object

Reviewers: msantl

Reviewed By: msantl

Differential Revision: https://phabricator.memgraph.io/D1332
2018-03-30 10:41:26 +02:00
Dominik Gleich
a9dd98f8b9 Remove rpc message macro semicolon
Summary:
Remove semicolon.
Semicolon shouldn't be used within macros and should
be explicitly provided by the user of the said macro.

Reviewers: teon.banek

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1321
2018-03-28 15:16:29 +02:00
Matija Santl
29ba055b64 Add custom VLOGs for distributed memgraph
Summary:
Add different priority VLOGs for distributed memgraph.

For level 3 you'll get logs for dispatching/consuming plans.
For level 4 you'll get logs for tx start/commit/abort, remote produce, remote
pull, remote result consume,
For level 5 there will be a log for each request/response made by the RPC
client.

Master log snippet P9
Worker log snippet P10

Reviewers: florijan, teon.banek

Reviewed By: florijan

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1296
2018-03-26 09:24:39 +02:00
Dominik Gleich
645e3bbc23 Add GlobalLast
Reviewers: florijan

Reviewed By: florijan

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1313
2018-03-23 15:10:51 +01:00
florijan
c826fa6640 Fix recovery bug (transaction ID bumping)
Summary:
When performing recovery, ensure that the transaction ID in engine is
bumped to one after the max tx id seen in recovery.

Reviewers: dgleich

Reviewed By: dgleich

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1312
2018-03-23 10:09:04 +01:00
florijan
543f953ab5 Check all RPC call results
Summary: Also make error reporting in consistent style: "NameRpc failed"

Reviewers: teon.banek, msantl

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1294
2018-03-13 15:16:50 +01:00
Matija Santl
1ca98826af Use the same ClientPools in distributed
Summary:
Instead of passing `coordination`, pass `rpc_worker_clients` that
holds a map of worker_id->clientPool. By having only one instance of
`RpcWorkerClients` that is owned by `GraphDB` and passing it by refference
we'll share the same client pools for rpc clients.

Reviewers: teon.banek, florijan, dgleich, mferencevic

Reviewed By: florijan

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1261
2018-03-01 17:14:59 +01:00
Dominik Gleich
79dc10960e Fix double free
Summary: It was possible for transaction to be double freed because both the AllClear and SingleClear of transactions could delete the pointer

Reviewers: florijan, mferencevic

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1250
2018-02-27 14:31:18 +01:00
florijan
bb62f463f8 Refactor distributed transactional cache GC
Summary:
Release of per-transaction data in distributed Memgraph refactored. The
master node no longer releases each time a transaction is done, thus
offloading some work from the engine.

Reviewers: dgleich

Reviewed By: dgleich

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1235
2018-02-27 10:47:58 +01:00
florijan
c8dc07ad0e Remove tx::Engine::GlobalIsActive
Summary:
Remove a method in tx::Engine whose results can be obtained from commit
log info (also guaranteed to be globally correct in distributed).

Reviewers: dgleich

Reviewed By: dgleich

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1240
2018-02-26 14:01:49 +01:00