e67b06ab61
Reviewers: buda, msantl, ipaljak Reviewed By: ipaljak Subscribers: teon.banek, pullbot Differential Revision: https://phabricator.memgraph.io/D1476
111 lines
5.1 KiB
Markdown
111 lines
5.1 KiB
Markdown
# DatabaseAccessor
|
|
|
|
A `DatabaseAccessor` actually wraps a transactional access to database
|
|
data, for a single transaction. In that sense the naming is bad. It
|
|
encapsulates references to the database and the transaction object.
|
|
|
|
It contains logic for working with database content (graph element
|
|
data) in the context of a single transaction. All CRUD operations are
|
|
performed within a single transaction (as Memgraph is a transactional
|
|
database), and therefore iteration over data, finding a specific graph
|
|
element etc are all functionalities of a `GraphDbAccessor`.
|
|
|
|
In single-node Memgraph the database accessor also defined the lifetime
|
|
of a transaction. Even though a `Transaction` object was owned by the
|
|
transactional engine, it was `GraphDbAccessor`'s lifetime that object
|
|
was bound to (the transaction was implicitly aborted in
|
|
`GraphDbAccessor`'s destructor, if it was not explicitly ended before
|
|
that).
|
|
|
|
# RecordAccessor
|
|
|
|
It is important to understand data organization and access in the
|
|
storage layer. This discussion pertains to vertices and edges as graph
|
|
elements that the end client works with.
|
|
|
|
Memgraph uses MVCC (documented on it's own page). This means that for
|
|
each graph element there could be different versions visible to
|
|
different currently executing transactions. When we talk about a
|
|
`Vertex` or `Edge` as a data structure we typically mean one of those
|
|
versions. In code this semantic is implemented so that both those classes
|
|
inherit `mvcc::Record`, which in turn inherits `mvcc::Version`.
|
|
|
|
Handling MVCC and visibility is not in itself trivial. Next to that,
|
|
there is other book-keeping to be performed when working with data. For
|
|
that reason, Memgraph uses "accessors" to define an API of working with
|
|
data in a safe way. Most of the code in Memgraph (for example the
|
|
interpretation code) should work with accessors. There is a
|
|
`RecordAccessor` as a base class for `VertexAccessor` and
|
|
`EdgeAccessor`. Following is an enumeration of their purpose.
|
|
|
|
### Data access
|
|
|
|
The client interacts with Memgraph using the Cypher query language. That
|
|
language has certain semantics which imply that multiple versions of the
|
|
data need to be visible during the execution of a single query. For
|
|
example: expansion over the graph is always done over the graph state as
|
|
it was at the beginning of the transaction.
|
|
|
|
The `RecordAccessor` exposes functions to switch between the old and the new
|
|
versions of the same graph element (intelligently named `SwitchOld` and
|
|
`SwitchNew`) within a single transaction. In that way the client code
|
|
(mostly the interpreter) can avoid dealing with the underlying MVCC
|
|
version concepts.
|
|
|
|
### Updates
|
|
|
|
Data updates are also done through accessors. Meaning: there are methods
|
|
on the accessors that modify data, the client code should almost never
|
|
interact directly with `Vertex` or `Edge` objects.
|
|
|
|
The accessor layer takes care of creating version in the MVCC layer and
|
|
performing updates on appropriate versions.
|
|
|
|
Next, for many kinds of updates it is necessary to update the relevant
|
|
indexes. There are implicit indexes for vertex labels, as
|
|
well as user-created indexes for (label, property) pairs. The accessor
|
|
layer takes care of updating the indexes when these values are changed.
|
|
|
|
Each update also triggers a log statement in the write-ahead log. This
|
|
is also handled by the accessor layer.
|
|
|
|
### Distributed
|
|
|
|
In distributed Memgraph accessors also contain a lot of the remote graph
|
|
element handling logic. More info on that is available in the
|
|
documentation for distributed.
|
|
|
|
### Deferred MVCC data lookup for Edges
|
|
|
|
Vertices and edges are versioned using MVCC. This means that for each
|
|
transaction an MVCC lookup needs to be done to determine which version
|
|
is visible to that transaction. This tends to slow things down due to
|
|
cache invalidations (version lists and versions are stored in arbitrary
|
|
locations on the heap).
|
|
|
|
However, for edges, only the properties are mutable. The edge endpoints
|
|
and type are fixed once the edge is created. For that reason both edge
|
|
endpoints and type are available in vertex data, so that when expanding
|
|
it is not mandatory to do MVCC lookups of versioned, mutable data. This
|
|
logic is implemented in `RecordAccessor` and `EdgeAccessor`.
|
|
|
|
### Exposure
|
|
|
|
The original idea and implementation of graph element accessors was that
|
|
they'd prevent client code from ever interacting with raw `Vertex` or
|
|
`Edge` data. This however turned out to be impractical when implementing
|
|
distributed Memgraph and the raw data members have since been exposed
|
|
(through getters to old and new version pointers). However, refrain from
|
|
working with that data directly whenever possible! Always consider the
|
|
accessors to be the first go-to for interacting with data, especially
|
|
when in the context of a transaction.
|
|
|
|
# Skiplist accessor
|
|
|
|
The term "accessor" is also used in the context of a skiplist. Every
|
|
operation on a skiplist must be performed within on an
|
|
accessor. The skiplist ensures that there will be no physical deletions
|
|
of an object during the lifetime of an accessor. This mechanism is used
|
|
to ensure deletion correctness in a highly concurrent container.
|
|
We only mention that here to avoid confusion regarding terminology.
|