e80c49f856
Summary: The following documents related to dynamic graph partitioning are added: * Dependency Diagram * Feature Specification * Feature Reference Reviewers: teon.banek Reviewed By: teon.banek Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1584
76 lines
2.7 KiB
Markdown
76 lines
2.7 KiB
Markdown
# Dynamic Graph Partitioning (abbr. DGP)
|
|
|
|
## Implementation
|
|
|
|
Take a look under `dev/memgraph/distributed/dynamic_graph_partitioning.md`.
|
|
|
|
### Implemented parameters
|
|
|
|
--dynamic-graph-partitioner-enabled (If the dynamic graph partitioner should be
|
|
enabled.) type: bool default: false (start time)
|
|
--dgp-improvement-threshold (How much better should specific node score be
|
|
to consider a migration to another worker. This represents the minimal
|
|
difference between new score that the vertex will have when migrated
|
|
and the old one such that it's migrated.) type: int32 default: 10
|
|
(start time)
|
|
--dgp-max-batch-size (Maximal amount of vertices which should be migrated
|
|
in one dynamic graph partitioner step.) type: int32 default: 2000
|
|
(start time)
|
|
|
|
## Planning
|
|
|
|
### Design decisions
|
|
|
|
* Each partitioning session has to be a new transaction.
|
|
* When and how does an instance perform the moves?
|
|
* Periodically.
|
|
* Token sharing (round robin, exactly one instance at a time has an
|
|
opportunity to perform the moves).
|
|
* On server-side serialization error (when DGP receives an error).
|
|
-> Quit partitioning and wait for the next turn.
|
|
* On client-side serialization error (when end client receives an error).
|
|
-> The client should never receive an error because of any
|
|
internal operation.
|
|
-> For the first implementation, it's good enough to wait until data becomes
|
|
available again.
|
|
-> It would be nice to achieve that DGP has lower priority than end client
|
|
operations.
|
|
|
|
### End-user parameters
|
|
|
|
* --dynamic-graph-partitioner-enabled (execution time)
|
|
* --dgp-improvement-threshold (execution time)
|
|
* --dgp-max-batch-size (execution time)
|
|
* --dgp-min-batch-size (execution time)
|
|
-> Minimum number of nodes that will be moved in each step.
|
|
* --dgp-fitness-threshold (execution time)
|
|
-> Do not perform moves if partitioning is good enough.
|
|
* --dgp-delta-turn-time (execution time)
|
|
-> Time between each turn.
|
|
* --dgp-delta-step-time (execution time)
|
|
-> Time between each step.
|
|
* --dgp-step-time (execution time)
|
|
-> Time limit per each step.
|
|
|
|
### Testing
|
|
|
|
The implementation has to provide good enough results in terms of:
|
|
* How good the partitioning is (numeric value), aka goodness.
|
|
* Workload execution time.
|
|
* Stress test correctness.
|
|
|
|
Test cases:
|
|
* N not connected subgraphs
|
|
-> shuffle nodes to N instances
|
|
-> run partitioning
|
|
-> test perfect partitioning.
|
|
* N connected subgraph
|
|
-> shuffle nodes to N instance
|
|
-> run partitioning
|
|
-> test partitioning.
|
|
* Take realistic workload (Long Running, LDBC1, LDBC2, Card Fraud, BFS, WSP)
|
|
-> measure exec time
|
|
-> run partitioning
|
|
-> test partitioning
|
|
-> measure exec time (during and after partitioning).
|