memgraph/docs/dev/distributed/general_concepts.md
Dominik Gleich e67b06ab61 Move documentation
Reviewers: buda, msantl, ipaljak

Reviewed By: ipaljak

Subscribers: teon.banek, pullbot

Differential Revision: https://phabricator.memgraph.io/D1476
2018-07-13 12:36:57 +02:00

2.5 KiB

Memgraph distributed

This chapter describes some of the concepts used in distributed Memgraph. By "distributed" here we mean the sharding of a single graph onto multiple processing units (servers).

Conceptual organization

There is a single master and multiple workers. The master contains all the global sources of truth (transaction engine, [label|edge-type|property] to name mappings). Also, in the current organization it is the only one that contains a Bolt server (for communication with the end client) and an interpretation engine. Workers contain the data and means of subquery interpretation (query plans recieved from the master) and means of communication with the master and other workers.

In many query plans the load on the master is much larger then the load on the workers. For that reason it might be beneficial to make the master contain less data (or none at all), and/or having multiple interpretation masters.

Logic organization

Both the distributed and the single node Memgraph use the same codebase. In cases where the behavior in single-node differs from that in distributed, some kind of dynamic behavior change is implemented (either through inheritance or conditional logic).

GraphDb

The database::GraphDb is an "umbrella" object for parts of the database such as storage, garbage collection, transaction engine etc. There is a class heirarchy of GraphDb implementations, as well as a base interface object. There are subclasses for single-node, master and worker deplotyments. Which implementation is used depends on the configuration processed in the main entry point of memgraph.

The GraphDb interface exposes getters to base classes of other similar heirarchies (for example to tx::Engine). In that way much of the code that uses those objects (for example query plan interpretation) is agnostic to the type of deployment.

RecordAccessors

The functionality of RecordAccessors and it's subclasses is already documented. It's important to note that the same implementation of accessors is used in all deployments, with internal changes of behavior depending on the locality of the graph element (vertex or edge) the accessor represents. For example, if the graph element is local, an update operation on an accessor will make the necessary MVCC ops, update local data, indexes, the write-ahead log etc. However, if the accessor represents a remote graph element, an update will trigger an RPC message to the owner about the update and a change in the local cache.