* Fix up REPLICA GetInfo and CreateSnapshot
Subtle bug where these actions were using the incorrect transactional
access while in REPLICA role. This casued timestamp to be incorrectly
bumped, breaking REPLICA from doing replication.
* Delay DNS resolution
Rather than resolve at endpoint creation, we will instread resolve only
on Socket connect. This allows k8s deployments to change their IP during
pod restarts.
* Minor sonarsource fixes
---------
Co-authored-by: Andreja <andreja.tonev@memgraph.io>
Co-authored-by: DavIvek <david.ivekovic@memgraph.io>
Fixes root cause of a cascade of failures in replication code:
- Replica handling of deleting an edge is now corrected. Now tolerant of multiple edges of the same relationship type.
- Improved robustness: correct exception handling around failed stream of current WAL file. This now means a REPLICA failure will no longer prevent transactions on MAIN from performing WAL writes.
- Slightly better diagnostic messages, not user friendly but helps get developer to correct root cause quicker.
- Proactively remove vertex+edges during Abort rather than defer to GC to do that work, this included fixing constraints and indexes to be safe.
Co-authored-by: Andreja Tonev <andreja.tonev@memgraph.io>
Single (instance level) connection to a replica (messages from all databases get multiplexed through it)
ReplicationClient split in two: ReplicationClient and ReplicationStorageClient
New ReplicationClient, moved under replication, handles the raw connection, owned by MainRoleData
ReplicationStorageClient handles the storage <-> replica state machine and holds to a stream
Removed epoch and storage from *Clients
rpc::Stream proactively aborts on error and sets itself to a defunct state
Removed HandleRpcFailure, instead we simply log the error and let the FrequentCheck handle re-connection
replica_state is now a synced variable
ReplicaStorageClient state machine bugfixes
Single FrequentCheck that goes through DBMS
Moved ReplicationState under DbmsHandler
Moved some replication startup logic under the DbmsHandler's constructor
Removed InMemoryReplicationClient
CreateReplicationClient has been removed from Storage
Simplified GetRecoverySteps and made safer
---------
Co-authored-by: Gareth Lloyd <gareth.lloyd@memgraph.io>
* Interpreter transaction ID decoupled from storage transaction ID
* Transactional scope for indices, statistics and constraints
* Storage::Accessor now has 2 modes (unique and shared)
* Introduced ResourceLock to fix pthread mutex problems
* Split InfoQuery in two: non-transactional SystemInfoQuery and transactional DatabaseInfoQuery
* Replicable and durable statistics
* Bumped WAL/Snapshot versions
* Initial implementation of the Lamport clock
---------
Co-authored-by: Andreja Tonev <andreja.tonev@memgraph.io>
Add a report for the case where a sync replica does not confirm within a timeout:
-Add a new exception: ReplicationException to be returned when one sync replica does not confirm the reception of messages (new data, new constraint/index, or for triggers)
-Update the logic to throw the ReplicationException when needed for insertion of new data, triggers, or creation of new constraint/index
-Add end-to-end tests to cover the loss of connection with sync/async replicas when adding new data, adding new constraint/indexes, and triggers
Add end-to-end tests to cover the creation and drop of indexes, existence constraints, and uniqueness constraints
Improved tooling function mg_sleep_and_assert to also show the last result when duration is exceeded
* Storage takes care of the saving of setting when a new replica is added
* Restore replicas at startup
* Modify interactive_mg_runner + memgraph to support that data-directory can be configured in CONTEXT
* Extend e2e test
* Correct typo
* Add flag to config to specify when replication should be stored (true by default when starting Memgraph)
* Remove un-necessary "--" in yaml file
* Make sure Memgraph stops if a replica can't be restored.
* Add UT covering the parsing of ReplicaStatus to/from json
* Add assert in e2e script to check that a port is free before using it
* Add test covering crash on Jepsen
* Make sure applciaiton crashes if it starts on corrupted replications' info
Starting with a non-reponsive replica is allowed.
* Add temporary startup flag: this is needed so jepsen do not automatically restore replica on startup of main. This will be removed in T0835
* Add test
* Add implementation and adapted test
* Update workloads.yaml to have a timeout > 0
* Update tests (failing due to merging of "add replica state")
* Disable sequential test
* Remove parent build and benchmark
* Save test data
* Save e2e logs in build folder
* Define different recovery time for each test