Commit Graph

17 Commits

Author SHA1 Message Date
Matija Santl
f85095c203 Fix Raft shutdown
Summary:
During the following scenario:
 - start a HA cluster with 3 machines
 - find the leader and start sending queries
 - SIGTERM the leader but leave other 2 machines untouched

The leader would be stuck in the shutdown phase.

This was happening because during the shutdown phase of the Bolt server, a
`graph_db_accessor` would try to commit a transaction after we've already shut
down Raft server.  Raft, although not running, is still thinking it's in the
Leader mode. Tx Engine calls the `SafeToCommit` method to Commit transactions,
and ends up in an infinite loop.

Since Raft was shut down it won't handle any of the incoming RPCs and won't
change it's mode.

The fix here is to shut down the Bolt server before Raft, so we don't have any
pending commits once Raft is shut down.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1853
2019-02-12 15:12:39 +01:00
Ivan Paljak
e3cf4d0df8 Make an in-memory copy of HA persistent storage pt. 1
Summary:
In Raft, we often need to access persistent state of the server
without modifying it. In order to speed up such operations, we
keep an in-memory copy of that state.

In this diff we make a copy of all persistent state except for
the log itself. Running our feature benchmark locally, we manage
to increase the throughput for cca 750 queries/s.

Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1843
2019-02-06 15:25:57 +01:00
Matija Santl
145c81376f Add log compaction for Raft, pt. 2
Summary: Implemented snapshot replication and log compaction.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1840
2019-02-04 15:32:07 +01:00
Matija Santl
da95cbf4ec Add log compaction for Raft, pt. 1
Summary:
In this part of log compaction for raft, I've implemented snapshooting
and snapshot recovery. I've also refactored the code a bit, so `RaftServer` now
has a pointer to the `GraphDb` and it can do some things by itself.

Log compaction requires some further work. Since snapshooting isn't synchronous
between peers, and each peer can work at their own pace, once we've compacted
the log so that the next log to be sent to peer `x` isn't available anymore, we
need to send the snapshot over the wire. This means that the next part will
contain the `InstallSnapshotRPC` and then maybe one more that will implement the
logic of sending `LogEntry` or the whole snapshot.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1834
2019-01-29 14:58:57 +01:00
Ivan Paljak
f09c1254f4 Optimize Raft log persistent storage
Summary:
Each `raft::LogEntry` is now persisted under its own key in our `KVStore`. Locally running our HA feature benchmark yields the following results:

```
duration 23.7
executed_writes: 15000
write_per_second: 632.888
```

This represents about 5x increase in throughput.

Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1799
2019-01-15 16:06:08 +01:00
Matija Santl
f53913e053 Add automated test for Raft
Summary:
Created a new integration test for Raft protocol.

The tests iterates through the Raft cluster and does the following:
* kill machine `X`
* execute a query
* bring `X` back to life

The first step is to insert a vertex in the cluster, and last step is to check
if the cluster has all the data.

I also edited some of the raft core files because this test surafaced some bugs.

The `tester` binary is a hacked version of the HA client and so are the parts in
the code that refuse to execute a query is the machine is not in `Leader` mode.o
Those parts will go away once we have a proper HA client.

I've run the `runner.py` for a while (215 times)
```
while ./runner.py &> log.txt; do echo -n "."; done
```
and it didn't break.

Reviewers: ipaljak, mferencevic

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1788
2019-01-14 13:41:36 +01:00
Ivan Paljak
cc3192cef7 Implement log replication in Raft
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1782
2019-01-04 16:07:12 +01:00
Matija Santl
363cdb8b88 Issue NO_OP StateDeltas on leader change
Summary:
Creating Raft noop logs on leader change will trigger the whole
log replication procedure that ends up committing/applying state deltas on newly
elected leaders that didn't receive the last commit index from the previous
leader.

I also included a small tweak that won't trigger add logs when a transaction
contains only BEGIN and ABORT StateDeltas, because we don't want to replicate
read queries.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1785
2019-01-04 10:12:32 +01:00
Matija Santl
8c51d2fa0b Add ReplicationLog to RaftServer
Summary:
* renamed `HasCommitted` to `SafeToCommit`
* implemented (c/p) `ReplicationLog`

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1776
2018-12-18 17:24:02 +01:00
Matija Santl
5e6cf0724a Implement StateDelta apply method for Raft
Summary:
TransactionReplicator replicates transactions on follower machines in
HA memgraph. Our DB accessor API doesn't provide us with the functionality to
begin transactions with non-increasing ids. This is why the
`TransactionReplicator` uses a internal map that maps tx ids from the leader
node to transactions on the follower node (whose id doesn't have to match the
leaders tx id).

If the leader has the following transaction timeline:

```
    L
tx1
 |
 |   tx2
 |    |
 |    |
 |    |
 |    |
 |    |
 |   tx2
 |
 |
 |
 |
tx1
```

`tx2` will commit first and will be replicated. When applying `tx2` on follower
nodes, they will start a new transaction with tx id `1`.  When `tx1` starts
replicating, followers will start a new transaction with tx id `2`. And this is
wehre `TransactionReplicator` kicks in.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1775
2018-12-14 14:26:40 +01:00
Matija Santl
8e35d8afdc Add Start/Stop methods to RaftServer
Summary:
Explicitly start and stop raft server.
This way we can be sure that raft won't try to use coordination after it's
shutdown, and we can define the start of th raft protocol easier.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1766
2018-12-11 14:24:04 +01:00
Ivan Paljak
8e796e9fd1 Fix infinite wait in leader election.
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1763
2018-12-11 10:35:35 +01:00
Matija Santl
f501980973 Wire raft into memgraph pt.1.
Summary:
This is just the first diff that tries to wire the raft protocol into
memgraph.

In this diff I'm introducing transaction engine reset functionality. I also
introduced `RaftInterface` which should be used wherever someone wants to access
Raft from Memgraph.

For design decisions see the feature spec.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak

Subscribers: pullbot, teon.banek

Differential Revision: https://phabricator.memgraph.io/D1758
2018-12-10 17:08:36 +01:00
Ivan Paljak
d637629078 Implement Raft RPC, log serialization for disk storage and leader election
Summary: This diff contains a rough implementation of the Raft protocol which ends at leader election.

Reviewers: msantl

Reviewed By: msantl

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1744
2018-12-10 12:49:22 +01:00
Matija Santl
b647e3f8b8 Prepare memgraph for HA
Summary:
Removed WAL and WAL recovery from single node ha binary.
Added `LogEntryBuffer` in `RaftServer`.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak, teon.banek

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1739
2018-11-22 15:07:39 +01:00
Matija Santl
dd6fe013dc Parse raft config
Summary:
Added command line parameters to specify rpc flags, raft and
coordination config files and current server id.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak, teon.banek

Subscribers: pullbot, teon.banek

Differential Revision: https://phabricator.memgraph.io/D1742
2018-11-21 17:34:13 +01:00
Ivan Paljak
73da1e4463 Add Raft skeleton
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1732
2018-11-19 13:21:58 +01:00