memgraph

Author	SHA1	Message	Date
Matija Santl	e1ad5cd803	Prevent snapshot recovery when exiting Summary: Sometimes, when the leader resigns it's leadership, a newly elected leader would send the old leader `AppendEntriesRPC` that would cause the snapshot recovery to happen. This diff prevents that. Reviewers: mferencevic, ipaljak Reviewed By: mferencevic Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1943	2019-04-04 14:41:33 +02:00
Matija Santl	6a9acb717d	Refactor StateDeltaApplier for HA Summary: The whole `StateDeltaApplier` implementation was unnecessary. Fixing this. Reviewers: mferencevic, ipaljak Reviewed By: mferencevic Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1942	2019-04-04 13:57:58 +02:00
Matija Santl	1124b8d371	Fix Raft election Summary: Based on the failure manifested in https://apollo.memgraph.io/runs/512803/ it seems like machines give each other votes for the same term. Looking at the code, `voted_for_` variable wasn't assigned on the election start and the election starter could grant his vote to someone else but would still count his vote to himself. Reviewers: ipaljak, mferencevic Reviewed By: mferencevic Subscribers: pullbot, vkasljevic Differential Revision: https://phabricator.memgraph.io/D1941	2019-04-03 12:45:57 +02:00
Ivan Paljak	750115e8ff	Display server state changes to user Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1932	2019-03-27 14:24:21 +01:00
Matija Santl	d387bac544	Fail HA benchmark on non-zero exit status Summary: For HA benchmarks, if one of the executables exits with a status other than zero, the benchmark should fail. Also, removing `LOG(INFO)`, since failing benchmarks should flag where to look. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1921	2019-03-14 16:53:58 +01:00
Matija Santl	f97872170a	Add a lock around replication timeout map in Raft Summary: Concurent access to the map that contains replication log timeouts caused the HA version to often report replication log timeout errors. Adding locks around the access prevets them from happening. Performance on Apollo reports write speed around 8k/s. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1920	2019-03-14 10:52:03 +01:00
Matija Santl	1de34d8b92	Add proper storage stats for HA Summary: `SHOW STORAGE STATS` when executed in a Raft cluster should return stats for each member of the cluster. `StorageStats` starts a RPC server on each member of the cluster that answers about its local storage stats. The query can be invoked only on the current leader, the leader sends a request to each peer and shows the results it gets. If some peers don't answer within 1 second, stats for those peers won't be shown. The new output can be seen here: P27 Reviewers: ipaljak, mferencevic Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1907	2019-03-07 15:00:40 +01:00
Matija Santl	54b23ba5b6	Add replication timeout in Raft Summary: Added a new config parameter, replication timeout. This parameter sets the upper limit to the replication phase and once the timeout exceeds, the transaction engine stops accepting new transactions. We could experience this timeout in two cases: 1. a network partition 2. majority of the cluster stops working Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1893	2019-02-27 17:42:35 +01:00
Matija Santl	4790d6458e	Add index support in HA Summary: Added index creation and deletion handling in StateDelta. Also included an integration test that creates an index and makes sure that it gets replicated by killing each peer eventually causing a leader re-election. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1886	2019-02-26 12:55:57 +01:00
Matija Santl	ed75e45541	Fix Raft failure discovered in apollo run 479391 Summary: We noticed a Raft test failure https://apollo.memgraph.io/runs/479391/ This diff should fix it. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: mferencevic, pullbot Differential Revision: https://phabricator.memgraph.io/D1865	2019-02-15 10:22:52 +01:00
Matej Ferencevic	bb052be002	Remove serialization from utils Reviewers: teon.banek Reviewed By: teon.banek Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1852	2019-02-13 15:41:40 +01:00
Matija Santl	68c910a083	Simplify log compaction Summary: Teon found this nit so we might fix it. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1860	2019-02-13 15:35:14 +01:00
Matija Santl	f85095c203	Fix Raft shutdown Summary: During the following scenario: - start a HA cluster with 3 machines - find the leader and start sending queries - SIGTERM the leader but leave other 2 machines untouched The leader would be stuck in the shutdown phase. This was happening because during the shutdown phase of the Bolt server, a `graph_db_accessor` would try to commit a transaction after we've already shut down Raft server. Raft, although not running, is still thinking it's in the Leader mode. Tx Engine calls the `SafeToCommit` method to Commit transactions, and ends up in an infinite loop. Since Raft was shut down it won't handle any of the incoming RPCs and won't change it's mode. The fix here is to shut down the Bolt server before Raft, so we don't have any pending commits once Raft is shut down. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1853	2019-02-12 15:12:39 +01:00
Ivan Paljak	e3cf4d0df8	Make an in-memory copy of HA persistent storage pt. 1 Summary: In Raft, we often need to access persistent state of the server without modifying it. In order to speed up such operations, we keep an in-memory copy of that state. In this diff we make a copy of all persistent state except for the log itself. Running our feature benchmark locally, we manage to increase the throughput for cca 750 queries/s. Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1843	2019-02-06 15:25:57 +01:00
Matija Santl	145c81376f	Add log compaction for Raft, pt. 2 Summary: Implemented snapshot replication and log compaction. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1840	2019-02-04 15:32:07 +01:00
Matija Santl	da95cbf4ec	Add log compaction for Raft, pt. 1 Summary: In this part of log compaction for raft, I've implemented snapshooting and snapshot recovery. I've also refactored the code a bit, so `RaftServer` now has a pointer to the `GraphDb` and it can do some things by itself. Log compaction requires some further work. Since snapshooting isn't synchronous between peers, and each peer can work at their own pace, once we've compacted the log so that the next log to be sent to peer `x` isn't available anymore, we need to send the snapshot over the wire. This means that the next part will contain the `InstallSnapshotRPC` and then maybe one more that will implement the logic of sending `LogEntry` or the whole snapshot. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1834	2019-01-29 14:58:57 +01:00
Matija Santl	4fa44c3edd	Fix Raft's ReplicationLog Summary: `ReplicationLog` had a classic off-by-one bug. The `valid_prefix` variable wasn't set properly. This diff also includes a poor man's version of a HA client. This client assumes that all the HA instances run on a single machine and that the corresponding Bold endpoints have open ports ranging from `7687` to `7687 + num_machines - 1`. This should make it easeir to test certain things, ie. disk usage, P25. This test revealed the bug with `ReplicationLog` Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1813	2019-01-23 16:27:51 +01:00
Matija Santl	62e06d4b70	Fix re-election in Raft Summary: Once a leader loses it's leadership, in order to handle hanging transactions, we reset the storage and the transaction engine. This requires to re-apply all the commited entries from the log. Once we add snapshot (log compaction) we would need to do that also. One thing to have in mind is the `election_timeout_min` parameter. If it's set too low it could trigger leader re-election too often. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1822	2019-01-22 14:51:24 +01:00
Ivan Paljak	16752af614	Force issuing heartbeats when appending to Raft log Summary: Locally run HA feature benchmark: ``` duration: 20.66 executed_writes: 150007 write_per_second: 7527.89 ``` Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1809	2019-01-16 16:38:45 +01:00
Ivan Paljak	f09c1254f4	Optimize Raft log persistent storage Summary: Each `raft::LogEntry` is now persisted under its own key in our `KVStore`. Locally running our HA feature benchmark yields the following results: ``` duration 23.7 executed_writes: 15000 write_per_second: 632.888 ``` This represents about 5x increase in throughput. Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1799	2019-01-15 16:06:08 +01:00
Matija Santl	f53913e053	Add automated test for Raft Summary: Created a new integration test for Raft protocol. The tests iterates through the Raft cluster and does the following: * kill machine `X` * execute a query * bring `X` back to life The first step is to insert a vertex in the cluster, and last step is to check if the cluster has all the data. I also edited some of the raft core files because this test surafaced some bugs. The `tester` binary is a hacked version of the HA client and so are the parts in the code that refuse to execute a query is the machine is not in `Leader` mode.o Those parts will go away once we have a proper HA client. I've run the `runner.py` for a while (215 times) ``` while ./runner.py &> log.txt; do echo -n "."; done ``` and it didn't break. Reviewers: ipaljak, mferencevic Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1788	2019-01-14 13:41:36 +01:00
Ivan Paljak	cc3192cef7	Implement log replication in Raft Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1782	2019-01-04 16:07:12 +01:00
Matija Santl	363cdb8b88	Issue NO_OP StateDeltas on leader change Summary: Creating Raft noop logs on leader change will trigger the whole log replication procedure that ends up committing/applying state deltas on newly elected leaders that didn't receive the last commit index from the previous leader. I also included a small tweak that won't trigger add logs when a transaction contains only BEGIN and ABORT StateDeltas, because we don't want to replicate read queries. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1785	2019-01-04 10:12:32 +01:00
Matija Santl	8c51d2fa0b	Add ReplicationLog to RaftServer Summary: * renamed `HasCommitted` to `SafeToCommit` * implemented (c/p) `ReplicationLog` Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1776	2018-12-18 17:24:02 +01:00
Matija Santl	5e6cf0724a	Implement StateDelta apply method for Raft Summary: TransactionReplicator replicates transactions on follower machines in HA memgraph. Our DB accessor API doesn't provide us with the functionality to begin transactions with non-increasing ids. This is why the `TransactionReplicator` uses a internal map that maps tx ids from the leader node to transactions on the follower node (whose id doesn't have to match the leaders tx id). If the leader has the following transaction timeline: ``` L tx1 \| \| tx2 \| \| \| \| \| \| \| \| \| \| \| tx2 \| \| \| \| tx1 ``` `tx2` will commit first and will be replicated. When applying `tx2` on follower nodes, they will start a new transaction with tx id `1`. When `tx1` starts replicating, followers will start a new transaction with tx id `2`. And this is wehre `TransactionReplicator` kicks in. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1775	2018-12-14 14:26:40 +01:00
Ivan Paljak	00506b9962	Fix bug in leader election (missing HB) Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1767	2018-12-11 15:43:24 +01:00
Matija Santl	8e35d8afdc	Add Start/Stop methods to RaftServer Summary: Explicitly start and stop raft server. This way we can be sure that raft won't try to use coordination after it's shutdown, and we can define the start of th raft protocol easier. Reviewers: ipaljak Reviewed By: ipaljak Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1766	2018-12-11 14:24:04 +01:00
Ivan Paljak	8e796e9fd1	Fix infinite wait in leader election. Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1763	2018-12-11 10:35:35 +01:00
Matija Santl	f501980973	Wire raft into memgraph pt.1. Summary: This is just the first diff that tries to wire the raft protocol into memgraph. In this diff I'm introducing transaction engine reset functionality. I also introduced `RaftInterface` which should be used wherever someone wants to access Raft from Memgraph. For design decisions see the feature spec. Reviewers: ipaljak, teon.banek Reviewed By: ipaljak Subscribers: pullbot, teon.banek Differential Revision: https://phabricator.memgraph.io/D1758	2018-12-10 17:08:36 +01:00
Ivan Paljak	d637629078	Implement Raft RPC, log serialization for disk storage and leader election Summary: This diff contains a rough implementation of the Raft protocol which ends at leader election. Reviewers: msantl Reviewed By: msantl Subscribers: teon.banek, pullbot Differential Revision: https://phabricator.memgraph.io/D1744	2018-12-10 12:49:22 +01:00
Teon Banek	f5b39cfc41	Serialize storage and durability via SLK Reviewers: mferencevic, msantl, ipaljak Reviewed By: mferencevic Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1759	2018-12-07 14:26:12 +01:00
Teon Banek	7638b09867	Generate SLK serialization from LCP Summary: Classes marked with `:serialize (:slk)` will now generate SLK serialization code. This diff also changes how the `:serialize` option is parsed, so that multiple different serialization backends are supported. Reviewers: mtomic, llugovic, mferencevic Reviewed By: mtomic Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1755	2018-12-05 14:58:39 +01:00
Matija Santl	b647e3f8b8	Prepare memgraph for HA Summary: Removed WAL and WAL recovery from single node ha binary. Added `LogEntryBuffer` in `RaftServer`. Reviewers: ipaljak, teon.banek Reviewed By: ipaljak, teon.banek Subscribers: teon.banek, pullbot Differential Revision: https://phabricator.memgraph.io/D1739	2018-11-22 15:07:39 +01:00
Matija Santl	dd6fe013dc	Parse raft config Summary: Added command line parameters to specify rpc flags, raft and coordination config files and current server id. Reviewers: ipaljak, teon.banek Reviewed By: ipaljak, teon.banek Subscribers: pullbot, teon.banek Differential Revision: https://phabricator.memgraph.io/D1742	2018-11-21 17:34:13 +01:00
Ivan Paljak	73da1e4463	Add Raft skeleton Reviewers: msantl Reviewed By: msantl Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1732	2018-11-19 13:21:58 +01:00
Ivan Paljak	7f44b895b4	Add Raft RPC messages Summary: Basic RPC messages for Raft protocol. They will most likely be updated as we move along with the implementation. Reviewers: msantl, teon.banek, mferencevic Reviewed By: msantl, teon.banek Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1726	2018-11-12 14:24:23 +01:00

36 Commits