Write replication feature spec (#6)
Co-authored-by: Matej Ferencevic <matej.ferencevic@memgraph.io>
This commit is contained in:
parent
367cb44983
commit
1580c46abc
181
docs/feature_spec/replication.md
Normal file
181
docs/feature_spec/replication.md
Normal file
@ -0,0 +1,181 @@
|
||||
# Replication
|
||||
|
||||
## High Level Context
|
||||
|
||||
Replication is a method that ensures that multiple database instances are
|
||||
storing the same data. To enable replication, there must be at least two
|
||||
instances of Memgraph in a cluster. Each instance has one of either two roles:
|
||||
master or slave. The master instance is the instance that accepts writes to the
|
||||
database and replicates its state to the slaves. In a cluster, there can only
|
||||
be one master. There can be one or more slaves. None of the slaves will accept
|
||||
write queries, but they will always accept read queries (there is an exception
|
||||
to this rule and is described below). Slaves can also be configured to be
|
||||
replicas of slaves, not necessarily replicas of the master. Each instance will
|
||||
always be reachable using the standard supported communication protocols. The
|
||||
replication will replicate WAL data. All data is transported through a custom
|
||||
binary protocol that will try remain backward compatible, so that replication
|
||||
immediately allows for zero downtime upgrades.
|
||||
|
||||
Each slave can be configured to accept replicated data in one of the following
|
||||
modes:
|
||||
- synchronous
|
||||
- asynchronous
|
||||
- semi-synchronous
|
||||
|
||||
### Synchronous Replication
|
||||
|
||||
When the data is replicated to a slave synchronously, all of the data of a
|
||||
currently pending transaction must be sent to the synchronous slave before the
|
||||
transaction is able to commit its changes.
|
||||
|
||||
This mode has a positive implication that all data that is committed to the
|
||||
master will always be replicated to the synchronous slave. It also has a
|
||||
negative performance implication because non-responsive slaves could grind all
|
||||
query execution to a halt.
|
||||
|
||||
This mode is good when you absolutely need to be sure that all data is always
|
||||
consistent between the master and the slave.
|
||||
|
||||
### Asynchronous Replication
|
||||
|
||||
When the data is replicated to a slave asynchronously, all pending transactions
|
||||
are immediately committed and their data is replicated to the asynchronous
|
||||
slave in the background.
|
||||
|
||||
This mode has a positive performance implication in which it won't slow down
|
||||
query execution. It also has a negative implication that the data between the
|
||||
master and the slave is almost never in a consistent state (when the data is
|
||||
being changed).
|
||||
|
||||
This mode is good when you don't care about consistency and only need an
|
||||
eventually consistent cluster, but you care about performance.
|
||||
|
||||
### Semi-synchronous Replication
|
||||
|
||||
When the data is replicated to a slave semi-synchronously, the data is
|
||||
replicated using both the synchronous and asynchronous methodology. The data is
|
||||
always replicated synchronously, but, if the slave for any reason doesn't
|
||||
respond within a preset timeout, the pending transaction is committed and the
|
||||
data is replicated to the slave asynchronously.
|
||||
|
||||
This mode has a positive implication that all data that is committed is
|
||||
*mostly* replicated to the semi-synchronous slave. It also has a negative
|
||||
performance implication as the synchronous replication mode.
|
||||
|
||||
This mode is useful when you want the replication to be synchronous to ensure
|
||||
that the data within the cluster is consistent, but you don't want the master
|
||||
to grind to a halt when you have a non-responsive slave.
|
||||
|
||||
### Addition of a New Slave
|
||||
|
||||
Each slave when added to the cluster (in any mode) will first start out as an
|
||||
asynchronous slave. That will allow slaves that have fallen behind to first
|
||||
catch-up to the current state of the database. When the slave is in a state
|
||||
that it isn't lagging behind the master it will then be promoted (in a brief
|
||||
stop-the-world operation) to a semi-synchronous or synchronous slave. Slaves
|
||||
that are added as asynchronous slaves will remain asynchronous.
|
||||
|
||||
## User Facing Setup
|
||||
|
||||
### How to Setup a Memgraph Cluster with Replication?
|
||||
|
||||
Replication configuration is done primarily through openCypher commands. This
|
||||
allows the cluster to be dynamically rearranged (new leader election, addition
|
||||
of a new slave, etc.).
|
||||
|
||||
Each Memgraph instance when first started will be a master. You have to change
|
||||
the mode of all slave nodes using the following openCypher query before you can
|
||||
enable replication on the master:
|
||||
|
||||
```plaintext
|
||||
SET REPLICATION MODE TO (MASTER|SLAVE);
|
||||
```
|
||||
|
||||
After you have set your slave instance to the correct operating mode, you can
|
||||
enable replication in the master instance by issuing the following openCypher
|
||||
command:
|
||||
```plaintext
|
||||
CREATE REPLICA name (SYNC|ASYNC) [WITH TIMEOUT 0.5] TO <ip_address>;
|
||||
```
|
||||
|
||||
Each Memgraph instance will remember that the configuration was set to and will
|
||||
automatically resume with its role when restarted.
|
||||
|
||||
### How to Setup an Advanced Replication Scenario?
|
||||
|
||||
The configuration allows for a more advanced scenario like this:
|
||||
```plaintext
|
||||
master -[asyncrhonous]-> slave 1 -[semi-synchronous]-> slave 2
|
||||
```
|
||||
|
||||
To configure the above scenario, issue the following commands:
|
||||
```plaintext
|
||||
SET REPLICATION MODE TO SLAVE; # on slave 1
|
||||
SET REPLICATION MODE TO SLAVE; # on slave 2
|
||||
|
||||
CREATE REPLICA slave1 ASYNC TO <slave1_ip>; # on master
|
||||
CREATE REPLICA slave2 SYNC WITH TIMEOUT 0.5 TO <slave2_ip>; # on slave 1
|
||||
```
|
||||
|
||||
### How to See the Current Replication Status?
|
||||
|
||||
To see the replication mode of the current Memgraph instance, you can issue the
|
||||
following query:
|
||||
|
||||
```plaintext
|
||||
SHOW REPLICATION MODE;
|
||||
```
|
||||
|
||||
To see the replicas of the current Memgraph instance, you can issue the
|
||||
following query:
|
||||
|
||||
```plaintext
|
||||
SHOW REPLICAS;
|
||||
```
|
||||
|
||||
To delete a replica, issue the following query:
|
||||
|
||||
```plaintext
|
||||
DELETE REPLICA 'name';
|
||||
```
|
||||
|
||||
### How to Promote a New Master?
|
||||
|
||||
When you have an already set-up cluster, to promote a new master, just set the
|
||||
slave that you want to be a master to the master role.
|
||||
|
||||
```plaintext
|
||||
SET REPLICATION MODE TO MASTER; # on desired slave
|
||||
```
|
||||
|
||||
After the command is issued, if the original master is still alive, it won't be
|
||||
able to replicate its data to the slave (the new master) anymore and will enter
|
||||
an error state. You must ensure that at any given point in time there aren't
|
||||
two masters in the cluster.
|
||||
|
||||
## Integration with Memgraph
|
||||
|
||||
WAL `Delta`s are replicated between the replication master and slave. With
|
||||
`Delta`s, all `StorageGlobalOperation`s are also replicated. Replication is
|
||||
essentially the same as appending to the WAL.
|
||||
|
||||
Synchronous replication will occur in `Commit` and each
|
||||
`StorageGlobalOperation` handler. The storage itself guarantees that `Commit`
|
||||
will be called single-threadedly and that no `StorageGlobalOperation` will be
|
||||
executed during an active transaction. Asynchronous replication will load its
|
||||
data from already written WAL files and transmit the data to the slave. All
|
||||
data will be replicated using our RPC protocol (SLK encoded).
|
||||
|
||||
For each replica the replication master (or slave) will keep track of the
|
||||
replica's state. That way, it will know which operations must be transmitted to
|
||||
the replica and which operations can be skipped. When a replica is very stale,
|
||||
a snapshot will be transmitted to it so that it can quickly synchronize with
|
||||
the current state. All following operations will transmit WAL deltas.
|
||||
|
||||
## Reading materials
|
||||
|
||||
1. [PostgreSQL comparison of different solutions](https://www.postgresql.org/docs/12/different-replication-solutions.html)
|
||||
2. [PostgreSQL docs](https://www.postgresql.org/docs/12/runtime-config-replication.html)
|
||||
3. [MySQL reference manual](https://dev.mysql.com/doc/refman/8.0/en/replication.html)
|
||||
4. [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/replication-setup-slaves.html)
|
||||
5. [MySQL master switch](https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-switch.html)
|
Loading…
Reference in New Issue
Block a user