diff --git a/docs/feature_spec/replication.md b/docs/feature_spec/replication.md new file mode 100644 index 000000000..1539b8ed0 --- /dev/null +++ b/docs/feature_spec/replication.md @@ -0,0 +1,181 @@ +# Replication + +## High Level Context + +Replication is a method that ensures that multiple database instances are +storing the same data. To enable replication, there must be at least two +instances of Memgraph in a cluster. Each instance has one of either two roles: +master or slave. The master instance is the instance that accepts writes to the +database and replicates its state to the slaves. In a cluster, there can only +be one master. There can be one or more slaves. None of the slaves will accept +write queries, but they will always accept read queries (there is an exception +to this rule and is described below). Slaves can also be configured to be +replicas of slaves, not necessarily replicas of the master. Each instance will +always be reachable using the standard supported communication protocols. The +replication will replicate WAL data. All data is transported through a custom +binary protocol that will try remain backward compatible, so that replication +immediately allows for zero downtime upgrades. + +Each slave can be configured to accept replicated data in one of the following +modes: + - synchronous + - asynchronous + - semi-synchronous + +### Synchronous Replication + +When the data is replicated to a slave synchronously, all of the data of a +currently pending transaction must be sent to the synchronous slave before the +transaction is able to commit its changes. + +This mode has a positive implication that all data that is committed to the +master will always be replicated to the synchronous slave. It also has a +negative performance implication because non-responsive slaves could grind all +query execution to a halt. + +This mode is good when you absolutely need to be sure that all data is always +consistent between the master and the slave. + +### Asynchronous Replication + +When the data is replicated to a slave asynchronously, all pending transactions +are immediately committed and their data is replicated to the asynchronous +slave in the background. + +This mode has a positive performance implication in which it won't slow down +query execution. It also has a negative implication that the data between the +master and the slave is almost never in a consistent state (when the data is +being changed). + +This mode is good when you don't care about consistency and only need an +eventually consistent cluster, but you care about performance. + +### Semi-synchronous Replication + +When the data is replicated to a slave semi-synchronously, the data is +replicated using both the synchronous and asynchronous methodology. The data is +always replicated synchronously, but, if the slave for any reason doesn't +respond within a preset timeout, the pending transaction is committed and the +data is replicated to the slave asynchronously. + +This mode has a positive implication that all data that is committed is +*mostly* replicated to the semi-synchronous slave. It also has a negative +performance implication as the synchronous replication mode. + +This mode is useful when you want the replication to be synchronous to ensure +that the data within the cluster is consistent, but you don't want the master +to grind to a halt when you have a non-responsive slave. + +### Addition of a New Slave + +Each slave when added to the cluster (in any mode) will first start out as an +asynchronous slave. That will allow slaves that have fallen behind to first +catch-up to the current state of the database. When the slave is in a state +that it isn't lagging behind the master it will then be promoted (in a brief +stop-the-world operation) to a semi-synchronous or synchronous slave. Slaves +that are added as asynchronous slaves will remain asynchronous. + +## User Facing Setup + +### How to Setup a Memgraph Cluster with Replication? + +Replication configuration is done primarily through openCypher commands. This +allows the cluster to be dynamically rearranged (new leader election, addition +of a new slave, etc.). + +Each Memgraph instance when first started will be a master. You have to change +the mode of all slave nodes using the following openCypher query before you can +enable replication on the master: + +```plaintext +SET REPLICATION MODE TO (MASTER|SLAVE); +``` + +After you have set your slave instance to the correct operating mode, you can +enable replication in the master instance by issuing the following openCypher +command: +```plaintext +CREATE REPLICA name (SYNC|ASYNC) [WITH TIMEOUT 0.5] TO ; +``` + +Each Memgraph instance will remember that the configuration was set to and will +automatically resume with its role when restarted. + +### How to Setup an Advanced Replication Scenario? + +The configuration allows for a more advanced scenario like this: +```plaintext +master -[asyncrhonous]-> slave 1 -[semi-synchronous]-> slave 2 +``` + +To configure the above scenario, issue the following commands: +```plaintext +SET REPLICATION MODE TO SLAVE; # on slave 1 +SET REPLICATION MODE TO SLAVE; # on slave 2 + +CREATE REPLICA slave1 ASYNC TO ; # on master +CREATE REPLICA slave2 SYNC WITH TIMEOUT 0.5 TO ; # on slave 1 +``` + +### How to See the Current Replication Status? + +To see the replication mode of the current Memgraph instance, you can issue the +following query: + +```plaintext +SHOW REPLICATION MODE; +``` + +To see the replicas of the current Memgraph instance, you can issue the +following query: + +```plaintext +SHOW REPLICAS; +``` + +To delete a replica, issue the following query: + +```plaintext +DELETE REPLICA 'name'; +``` + +### How to Promote a New Master? + +When you have an already set-up cluster, to promote a new master, just set the +slave that you want to be a master to the master role. + +```plaintext +SET REPLICATION MODE TO MASTER; # on desired slave +``` + +After the command is issued, if the original master is still alive, it won't be +able to replicate its data to the slave (the new master) anymore and will enter +an error state. You must ensure that at any given point in time there aren't +two masters in the cluster. + +## Integration with Memgraph + +WAL `Delta`s are replicated between the replication master and slave. With +`Delta`s, all `StorageGlobalOperation`s are also replicated. Replication is +essentially the same as appending to the WAL. + +Synchronous replication will occur in `Commit` and each +`StorageGlobalOperation` handler. The storage itself guarantees that `Commit` +will be called single-threadedly and that no `StorageGlobalOperation` will be +executed during an active transaction. Asynchronous replication will load its +data from already written WAL files and transmit the data to the slave. All +data will be replicated using our RPC protocol (SLK encoded). + +For each replica the replication master (or slave) will keep track of the +replica's state. That way, it will know which operations must be transmitted to +the replica and which operations can be skipped. When a replica is very stale, +a snapshot will be transmitted to it so that it can quickly synchronize with +the current state. All following operations will transmit WAL deltas. + +## Reading materials + +1. [PostgreSQL comparison of different solutions](https://www.postgresql.org/docs/12/different-replication-solutions.html) +2. [PostgreSQL docs](https://www.postgresql.org/docs/12/runtime-config-replication.html) +3. [MySQL reference manual](https://dev.mysql.com/doc/refman/8.0/en/replication.html) +4. [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/replication-setup-slaves.html) +5. [MySQL master switch](https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-switch.html)