Commit Graph

30 Commits

Author SHA1 Message Date
Antonio Filipovic
4f4a569c72
Revert replication tests (#1707) 2024-02-12 16:42:57 +01:00
Andi
38ade99652
HA: Add coordinator to replication cluster (#1608) 2024-01-24 13:07:51 +01:00
andrejtonev
071df2f439
Replication refactor part 7 (#1550)
* Split queries into system and data queries
* System queries are sequentially executed and generate separate transaction deltas
* System transaction try locks for 100ms
* last_commited_system_ts saved to DBMS durability
* Replicating CREATE/DROP DATABASE
* Sending a system snapshot if REPLICA behind
* Passing a copy of the gatekeeper::access as std::any to all functions that could call an async execution
* Removed delete_on_drop flag (we now always delete on drop)
* Using UUID as the directory name for databases
* DBMS durability update (added versioning and salient information)
* Automatic migration from previous version
* Interpreter can run some queries without a target database
* SHOW REPLICA returns the status of the currently active DB
* Returning UUID instead of db name in the RPC responses
* Using UUIDs for database specification in RPC (not name)
* FrequentCheck forces update on reconnect
* TimestampRpc will detect if a replica is behind, and will update client's state
* Safer SLK reads
* Split SHOW DATABASES in two SHOW DATABASES (list of current databases) and SHOW DATABASE a single string naming the current database

---------

Co-authored-by: Gareth Lloyd <gareth.lloyd@memgraph.io>
2024-01-23 12:06:10 +01:00
Gareth Andrew Lloyd
0fb8e4116f
Fix REPLICA timestamps (#1615)
* Fix up REPLICA GetInfo and CreateSnapshot

Subtle bug where these actions were using the incorrect transactional
access while in REPLICA role. This casued timestamp to be incorrectly
bumped, breaking REPLICA from doing replication.

* Delay DNS resolution

Rather than resolve at endpoint creation, we will instread resolve only
on Socket connect. This allows k8s deployments to change their IP during
pod restarts.

* Minor sonarsource fixes

---------
Co-authored-by: Andreja <andreja.tonev@memgraph.io>
Co-authored-by: DavIvek <david.ivekovic@memgraph.io>
2024-01-05 16:42:54 +00:00
Gareth Andrew Lloyd
14f92b4a0f
Bugfix: correct replication handler (#1540)
Fixes root cause of a cascade of failures in replication code:
- Replica handling of deleting an edge is now corrected. Now tolerant of multiple edges of the same relationship type.
- Improved robustness: correct exception handling around failed stream of current WAL file. This now means a REPLICA failure will no longer prevent transactions on MAIN from performing WAL writes.
- Slightly better diagnostic messages, not user friendly but helps get developer to correct root cause quicker.
- Proactively remove vertex+edges during Abort rather than defer to GC to do that work, this included fixing constraints and indexes to be safe.


Co-authored-by: Andreja Tonev <andreja.tonev@memgraph.io>
2023-12-01 12:38:48 +00:00
andrejtonev
8b9e1fa08b
Replication refactor part 6 (#1484)
Single (instance level) connection to a replica (messages from all databases get multiplexed through it)
ReplicationClient split in two: ReplicationClient and ReplicationStorageClient
New ReplicationClient, moved under replication, handles the raw connection, owned by MainRoleData
ReplicationStorageClient handles the storage <-> replica state machine and holds to a stream
Removed epoch and storage from *Clients
rpc::Stream proactively aborts on error and sets itself to a defunct state
Removed HandleRpcFailure, instead we simply log the error and let the FrequentCheck handle re-connection
replica_state is now a synced variable
ReplicaStorageClient state machine bugfixes
Single FrequentCheck that goes through DBMS
Moved ReplicationState under DbmsHandler
Moved some replication startup logic under the DbmsHandler's constructor
Removed InMemoryReplicationClient
CreateReplicationClient has been removed from Storage
Simplified GetRecoverySteps and made safer

---------

Co-authored-by: Gareth Lloyd <gareth.lloyd@memgraph.io>
2023-11-23 11:02:35 +01:00
Gareth Andrew Lloyd
e4f94c15c6
Fixes for clang-tidy / sonar issues (#1536) 2023-11-22 13:05:02 +00:00
Andi
e5b2c19ea2
Empty Collect() returns nothing (#1482) 2023-11-13 11:45:09 +01:00
andrejtonev
dbc6054689
Replication refactor (part 5) (#1378) 2023-11-06 11:50:49 +00:00
gvolfing
c296dc67ce
Add index count to index info (#1229) 2023-10-27 18:13:05 +02:00
Gareth Andrew Lloyd
3cc2bc2791
Refactor interpreter to support multiple distributed clocks (Part 1) (#1281)
* Interpreter transaction ID decoupled from storage transaction ID
* Transactional scope for indices, statistics and constraints
* Storage::Accessor now has 2 modes (unique and shared)
* Introduced ResourceLock to fix pthread mutex problems
* Split InfoQuery in two: non-transactional SystemInfoQuery and transactional DatabaseInfoQuery
* Replicable and durable statistics
* Bumped WAL/Snapshot versions
* Initial implementation of the Lamport clock

---------

Co-authored-by: Andreja Tonev <andreja.tonev@memgraph.io>
2023-10-05 16:58:39 +02:00
Marko Budiselić
3b9133fd5a
Improve e2e and replication testing setup (#1061)
* Add `--replication-restore-state-on-startup` with `false` as default

Co-authored-by: Aidar Samerkhanov <aidar.samerkhanov@memgraph.io>
Co-authored-by: Andi Skrgat <andi8647@gmail.com>
2023-07-19 21:18:43 +02:00
Josipmrden
b875649270
Add restoring of replication roles upon database startup (#791)
Fix replica node restoration on startup so it is restored as replica and not as main.
2023-06-21 19:08:58 +02:00
Jeremy B
d4f0bb0e38
Correct inconsistencies w.r.t. sync replication (#435)
Add a report for the case where a sync replica does not confirm within a timeout:
-Add a new exception: ReplicationException to be returned when one sync replica does not confirm the reception of messages (new data, new constraint/index, or for triggers)
-Update the logic to throw the ReplicationException when needed for insertion of new data, triggers, or creation of new constraint/index
-Add end-to-end tests to cover the loss of connection with sync/async replicas when adding new data, adding new constraint/indexes, and triggers

Add end-to-end tests to cover the creation and drop of indexes, existence constraints, and uniqueness constraints

Improved tooling function mg_sleep_and_assert to also show the last result when duration is exceeded
2022-08-09 11:29:55 +02:00
Jeremy B
063e297e1e
Avoid usage of time.sleep (#434)
e2e python: added tooling function around `time.sleep()` that stops as soon as condition is fulfilled and will raise assert if timeout is reached
2022-07-08 10:47:18 +02:00
Jeremy B
f629de7e60
Save replication settings (#415)
* Storage takes care of the saving of setting when a new replica is added

* Restore replicas at startup

* Modify interactive_mg_runner + memgraph to support that data-directory can be configured in CONTEXT

* Extend e2e test

* Correct typo

* Add flag to config to specify when replication should be stored (true by default when starting Memgraph)

* Remove un-necessary "--" in yaml file

* Make sure Memgraph stops if a replica can't be restored.

* Add UT covering the parsing  of ReplicaStatus to/from json

* Add assert in e2e script to check that a port is free before using it

* Add test covering crash on Jepsen

* Make sure applciaiton crashes if it starts on corrupted replications' info

Starting with a non-reponsive replica is allowed.

* Add temporary startup flag: this is needed so jepsen do not automatically restore replica on startup of main. This will be removed in T0835
2022-07-07 13:30:28 +02:00
Jeremy B
b737e53456
Remove sync with timeout (#423)
* Remove timout when registering a sync replica

* Simplify jepsen configuration file

* Remove timeout from jepsen configuration

* Add unit test

* Remove TimeoutDispatcher
2022-07-05 09:40:50 +02:00
Jeremy B
1ae6b71c5f
Registering a replica with timeout 0 should not be allowed (#414) 2022-06-29 11:14:23 +03:00
Jeremy B
65a7ba01da
Add information on show replicas to express how up-to-date a replica is (#412)
* Add test

* Add implementation and adapted test

* Update workloads.yaml to have a timeout > 0

* Update tests (failing due to merging of "add replica state")
2022-06-23 10:22:57 +02:00
Marko Budiselić
599c0a641f
Add replica state to SHOW REPLICAS (#379) 2022-06-20 13:28:42 +03:00
Marko Budiselić
21ad5d4328
Fix SHOW REPLICATION ROLE and SHOW REPLICAS (#376) 2022-05-20 20:17:59 -07:00
jbajic
12b4ec1589 Add memgraph namespace 2022-03-14 15:47:41 +01:00
Antonio Andelic
bb1308acc7
Use libs from toolchain (#326) 2022-01-21 10:22:36 +01:00
Antonio Andelic
bd21bc82b7
Add license to cpp/hpp/py test files (#283) 2021-10-26 08:53:56 +02:00
antonio2368
e51954fc94
Update toolchain to v3 (#189)
* Make memgraph buildable with new toolchain

* Use toolchain v3 in workflows
2021-07-08 14:20:48 +02:00
antonio2368
5c93f81881
Disable failing tests and add logs for replication e2e (#132)
* Disable sequential test

* Remove parent build and benchmark

* Save test data

* Save e2e logs in build folder

* Define different recovery time for each test
2021-04-02 12:29:10 +02:00
János Benjamin Antal
06f761bdf9
Add logs for loading snapshot and WAL files (#121)
* Add logs for loading snapshot and WAL files
2021-03-26 15:02:35 +01:00
antonio2368
3f3c55a4aa
Format all the memgraph and test source files (#97) 2021-02-18 15:32:43 +01:00
antonio2368
28413fd626 Change log library to spdlog, expose log levels to user (#72)
* Change from glog to spdlog

* Remove HA tests

* Remove logrotate log configuration

* Define custom main for unit gtests
2021-01-21 16:30:55 +01:00
Marko Budiselić
afbf672915 Add end2end replication tests based on mgclient (#69)
* Remove old HA benchmark and integration tests
2021-01-21 15:56:21 +01:00