Commit Graph

36 Commits

Author SHA1 Message Date
Matija Santl
e1ad5cd803 Prevent snapshot recovery when exiting
Summary:
Sometimes, when the leader resigns it's leadership, a newly elected
leader would send the old leader `AppendEntriesRPC` that would cause the
snapshot recovery to happen. This diff prevents that.

Reviewers: mferencevic, ipaljak

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1943
2019-04-04 14:41:33 +02:00
Matija Santl
6a9acb717d Refactor StateDeltaApplier for HA
Summary:
The whole `StateDeltaApplier` implementation was unnecessary. Fixing
this.

Reviewers: mferencevic, ipaljak

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1942
2019-04-04 13:57:58 +02:00
Matija Santl
1124b8d371 Fix Raft election
Summary:
Based on the failure manifested in
https://apollo.memgraph.io/runs/512803/ it seems like machines give each other
votes for the same term.

Looking at the code, `voted_for_` variable wasn't assigned on the election start
and the election starter could grant his vote to someone else but would still
count his vote to himself.

Reviewers: ipaljak, mferencevic

Reviewed By: mferencevic

Subscribers: pullbot, vkasljevic

Differential Revision: https://phabricator.memgraph.io/D1941
2019-04-03 12:45:57 +02:00
Ivan Paljak
750115e8ff Display server state changes to user
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1932
2019-03-27 14:24:21 +01:00
Matija Santl
d387bac544 Fail HA benchmark on non-zero exit status
Summary:
For HA benchmarks, if one of the executables exits with a status other
than zero, the benchmark should fail.

Also, removing `LOG(INFO)`, since failing benchmarks should flag where to look.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1921
2019-03-14 16:53:58 +01:00
Matija Santl
f97872170a Add a lock around replication timeout map in Raft
Summary:
Concurent access to the map that contains replication log timeouts
caused the HA version to often report replication log timeout errors.

Adding locks around the access prevets them from happening.

Performance on Apollo reports write speed around 8k/s.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1920
2019-03-14 10:52:03 +01:00
Matija Santl
1de34d8b92 Add proper storage stats for HA
Summary:
`SHOW STORAGE STATS` when executed in a Raft cluster should return
stats for each member of the cluster.

`StorageStats` starts a RPC server on each member of the cluster that answers
about its local storage stats.

The query can be invoked only on the current leader, the leader sends a request
to each peer and shows the results it gets. If some peers don't answer within 1
second, stats for those peers won't be shown.

The new output can be seen here: P27

Reviewers: ipaljak, mferencevic

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1907
2019-03-07 15:00:40 +01:00
Matija Santl
54b23ba5b6 Add replication timeout in Raft
Summary:
Added a new config parameter, replication timeout. This parameter sets the
upper limit to the replication phase and once the timeout exceeds, the
transaction engine stops accepting new transactions.

We could experience this timeout in two cases:
 1. a network partition
 2. majority of the cluster stops working

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1893
2019-02-27 17:42:35 +01:00
Matija Santl
4790d6458e Add index support in HA
Summary:
Added index creation and deletion handling in StateDelta.
Also included an integration test that creates an index and makes sure that it
gets replicated by killing each peer eventually causing a leader re-election.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1886
2019-02-26 12:55:57 +01:00
Matija Santl
ed75e45541 Fix Raft failure discovered in apollo run 479391
Summary:
We noticed a Raft test failure https://apollo.memgraph.io/runs/479391/

This diff should fix it.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: mferencevic, pullbot

Differential Revision: https://phabricator.memgraph.io/D1865
2019-02-15 10:22:52 +01:00
Matej Ferencevic
bb052be002 Remove serialization from utils
Reviewers: teon.banek

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1852
2019-02-13 15:41:40 +01:00
Matija Santl
68c910a083 Simplify log compaction
Summary: Teon found this nit so we might fix it.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1860
2019-02-13 15:35:14 +01:00
Matija Santl
f85095c203 Fix Raft shutdown
Summary:
During the following scenario:
 - start a HA cluster with 3 machines
 - find the leader and start sending queries
 - SIGTERM the leader but leave other 2 machines untouched

The leader would be stuck in the shutdown phase.

This was happening because during the shutdown phase of the Bolt server, a
`graph_db_accessor` would try to commit a transaction after we've already shut
down Raft server.  Raft, although not running, is still thinking it's in the
Leader mode. Tx Engine calls the `SafeToCommit` method to Commit transactions,
and ends up in an infinite loop.

Since Raft was shut down it won't handle any of the incoming RPCs and won't
change it's mode.

The fix here is to shut down the Bolt server before Raft, so we don't have any
pending commits once Raft is shut down.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1853
2019-02-12 15:12:39 +01:00
Ivan Paljak
e3cf4d0df8 Make an in-memory copy of HA persistent storage pt. 1
Summary:
In Raft, we often need to access persistent state of the server
without modifying it. In order to speed up such operations, we
keep an in-memory copy of that state.

In this diff we make a copy of all persistent state except for
the log itself. Running our feature benchmark locally, we manage
to increase the throughput for cca 750 queries/s.

Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1843
2019-02-06 15:25:57 +01:00
Matija Santl
145c81376f Add log compaction for Raft, pt. 2
Summary: Implemented snapshot replication and log compaction.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1840
2019-02-04 15:32:07 +01:00
Matija Santl
da95cbf4ec Add log compaction for Raft, pt. 1
Summary:
In this part of log compaction for raft, I've implemented snapshooting
and snapshot recovery. I've also refactored the code a bit, so `RaftServer` now
has a pointer to the `GraphDb` and it can do some things by itself.

Log compaction requires some further work. Since snapshooting isn't synchronous
between peers, and each peer can work at their own pace, once we've compacted
the log so that the next log to be sent to peer `x` isn't available anymore, we
need to send the snapshot over the wire. This means that the next part will
contain the `InstallSnapshotRPC` and then maybe one more that will implement the
logic of sending `LogEntry` or the whole snapshot.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1834
2019-01-29 14:58:57 +01:00
Matija Santl
4fa44c3edd Fix Raft's ReplicationLog
Summary:
`ReplicationLog` had a classic off-by-one bug. The `valid_prefix`
variable wasn't set properly.

This diff also includes a poor man's version of a HA client. This client
assumes that all the HA instances run on a single machine and that the
corresponding Bold endpoints have open ports ranging from `7687` to
`7687 + num_machines - 1`.

This should make it easeir to test certain things, ie. disk usage, P25.

This test revealed the bug with `ReplicationLog`

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1813
2019-01-23 16:27:51 +01:00
Matija Santl
62e06d4b70 Fix re-election in Raft
Summary:
Once a leader loses it's leadership, in order to handle hanging
transactions, we reset the storage and the transaction engine.

This requires to re-apply all the commited entries from the log.

Once we add snapshot (log compaction) we would need to do that also.

One thing to have in mind is the `election_timeout_min` parameter. If it's set
too low it could trigger leader re-election too often.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1822
2019-01-22 14:51:24 +01:00
Ivan Paljak
16752af614 Force issuing heartbeats when appending to Raft log
Summary:
Locally run HA feature benchmark:

```
duration: 20.66
executed_writes: 150007
write_per_second: 7527.89
```

Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1809
2019-01-16 16:38:45 +01:00
Ivan Paljak
f09c1254f4 Optimize Raft log persistent storage
Summary:
Each `raft::LogEntry` is now persisted under its own key in our `KVStore`. Locally running our HA feature benchmark yields the following results:

```
duration 23.7
executed_writes: 15000
write_per_second: 632.888
```

This represents about 5x increase in throughput.

Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1799
2019-01-15 16:06:08 +01:00
Matija Santl
f53913e053 Add automated test for Raft
Summary:
Created a new integration test for Raft protocol.

The tests iterates through the Raft cluster and does the following:
* kill machine `X`
* execute a query
* bring `X` back to life

The first step is to insert a vertex in the cluster, and last step is to check
if the cluster has all the data.

I also edited some of the raft core files because this test surafaced some bugs.

The `tester` binary is a hacked version of the HA client and so are the parts in
the code that refuse to execute a query is the machine is not in `Leader` mode.o
Those parts will go away once we have a proper HA client.

I've run the `runner.py` for a while (215 times)
```
while ./runner.py &> log.txt; do echo -n "."; done
```
and it didn't break.

Reviewers: ipaljak, mferencevic

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1788
2019-01-14 13:41:36 +01:00
Ivan Paljak
cc3192cef7 Implement log replication in Raft
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1782
2019-01-04 16:07:12 +01:00
Matija Santl
363cdb8b88 Issue NO_OP StateDeltas on leader change
Summary:
Creating Raft noop logs on leader change will trigger the whole
log replication procedure that ends up committing/applying state deltas on newly
elected leaders that didn't receive the last commit index from the previous
leader.

I also included a small tweak that won't trigger add logs when a transaction
contains only BEGIN and ABORT StateDeltas, because we don't want to replicate
read queries.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1785
2019-01-04 10:12:32 +01:00
Matija Santl
8c51d2fa0b Add ReplicationLog to RaftServer
Summary:
* renamed `HasCommitted` to `SafeToCommit`
* implemented (c/p) `ReplicationLog`

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1776
2018-12-18 17:24:02 +01:00
Matija Santl
5e6cf0724a Implement StateDelta apply method for Raft
Summary:
TransactionReplicator replicates transactions on follower machines in
HA memgraph. Our DB accessor API doesn't provide us with the functionality to
begin transactions with non-increasing ids. This is why the
`TransactionReplicator` uses a internal map that maps tx ids from the leader
node to transactions on the follower node (whose id doesn't have to match the
leaders tx id).

If the leader has the following transaction timeline:

```
    L
tx1
 |
 |   tx2
 |    |
 |    |
 |    |
 |    |
 |    |
 |   tx2
 |
 |
 |
 |
tx1
```

`tx2` will commit first and will be replicated. When applying `tx2` on follower
nodes, they will start a new transaction with tx id `1`.  When `tx1` starts
replicating, followers will start a new transaction with tx id `2`. And this is
wehre `TransactionReplicator` kicks in.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1775
2018-12-14 14:26:40 +01:00
Ivan Paljak
00506b9962 Fix bug in leader election (missing HB)
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1767
2018-12-11 15:43:24 +01:00
Matija Santl
8e35d8afdc Add Start/Stop methods to RaftServer
Summary:
Explicitly start and stop raft server.
This way we can be sure that raft won't try to use coordination after it's
shutdown, and we can define the start of th raft protocol easier.

Reviewers: ipaljak

Reviewed By: ipaljak

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1766
2018-12-11 14:24:04 +01:00
Ivan Paljak
8e796e9fd1 Fix infinite wait in leader election.
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1763
2018-12-11 10:35:35 +01:00
Matija Santl
f501980973 Wire raft into memgraph pt.1.
Summary:
This is just the first diff that tries to wire the raft protocol into
memgraph.

In this diff I'm introducing transaction engine reset functionality. I also
introduced `RaftInterface` which should be used wherever someone wants to access
Raft from Memgraph.

For design decisions see the feature spec.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak

Subscribers: pullbot, teon.banek

Differential Revision: https://phabricator.memgraph.io/D1758
2018-12-10 17:08:36 +01:00
Ivan Paljak
d637629078 Implement Raft RPC, log serialization for disk storage and leader election
Summary: This diff contains a rough implementation of the Raft protocol which ends at leader election.

Reviewers: msantl

Reviewed By: msantl

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1744
2018-12-10 12:49:22 +01:00
Teon Banek
f5b39cfc41 Serialize storage and durability via SLK
Reviewers: mferencevic, msantl, ipaljak

Reviewed By: mferencevic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1759
2018-12-07 14:26:12 +01:00
Teon Banek
7638b09867 Generate SLK serialization from LCP
Summary:
Classes marked with `:serialize (:slk)` will now generate SLK
serialization code. This diff also changes how the `:serialize` option
is parsed, so that multiple different serialization backends are
supported.

Reviewers: mtomic, llugovic, mferencevic

Reviewed By: mtomic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1755
2018-12-05 14:58:39 +01:00
Matija Santl
b647e3f8b8 Prepare memgraph for HA
Summary:
Removed WAL and WAL recovery from single node ha binary.
Added `LogEntryBuffer` in `RaftServer`.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak, teon.banek

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1739
2018-11-22 15:07:39 +01:00
Matija Santl
dd6fe013dc Parse raft config
Summary:
Added command line parameters to specify rpc flags, raft and
coordination config files and current server id.

Reviewers: ipaljak, teon.banek

Reviewed By: ipaljak, teon.banek

Subscribers: pullbot, teon.banek

Differential Revision: https://phabricator.memgraph.io/D1742
2018-11-21 17:34:13 +01:00
Ivan Paljak
73da1e4463 Add Raft skeleton
Reviewers: msantl

Reviewed By: msantl

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1732
2018-11-19 13:21:58 +01:00
Ivan Paljak
7f44b895b4 Add Raft RPC messages
Summary:
Basic RPC messages for Raft protocol. They will most likely be updated as we
move along with the implementation.

Reviewers: msantl, teon.banek, mferencevic

Reviewed By: msantl, teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1726
2018-11-12 14:24:23 +01:00