memgraph/docs/dev/storage/accessors.md

111 lines
5.1 KiB
Markdown
Raw Normal View History

# DatabaseAccessor
A `DatabaseAccessor` actually wraps a transactional access to database
data, for a single transaction. In that sense the naming is bad. It
encapsulates references to the database and the transaction object.
It contains logic for working with database content (graph element
data) in the context of a single transaction. All CRUD operations are
performed within a single transaction (as Memgraph is a transactional
database), and therefore iteration over data, finding a specific graph
element etc are all functionalities of a `GraphDbAccessor`.
In single-node Memgraph the database accessor also defined the lifetime
of a transaction. Even though a `Transaction` object was owned by the
transactional engine, it was `GraphDbAccessor`'s lifetime that object
was bound to (the transaction was implicitly aborted in
`GraphDbAccessor`'s destructor, if it was not explicitly ended before
that).
# RecordAccessor
It is important to understand data organization and access in the
storage layer. This discussion pertains to vertices and edges as graph
elements that the end client works with.
Memgraph uses MVCC (documented on it's own page). This means that for
each graph element there could be different versions visible to
different currently executing transactions. When we talk about a
`Vertex` or `Edge` as a data structure we typically mean one of those
versions. In code this semantic is implemented so that both those classes
inherit `mvcc::Record`, which in turn inherits `mvcc::Version`.
Handling MVCC and visibility is not in itself trivial. Next to that,
there is other book-keeping to be performed when working with data. For
that reason, Memgraph uses "accessors" to define an API of working with
data in a safe way. Most of the code in Memgraph (for example the
interpretation code) should work with accessors. There is a
`RecordAccessor` as a base class for `VertexAccessor` and
`EdgeAccessor`. Following is an enumeration of their purpose.
### Data access
The client interacts with Memgraph using the Cypher query language. That
language has certain semantics which imply that multiple versions of the
data need to be visible during the execution of a single query. For
example: expansion over the graph is always done over the graph state as
it was at the beginning of the transaction.
The `RecordAccessor` exposes functions to switch between the old and the new
versions of the same graph element (intelligently named `SwitchOld` and
`SwitchNew`) within a single transaction. In that way the client code
(mostly the interpreter) can avoid dealing with the underlying MVCC
version concepts.
### Updates
Data updates are also done through accessors. Meaning: there are methods
on the accessors that modify data, the client code should almost never
interact directly with `Vertex` or `Edge` objects.
The accessor layer takes care of creating version in the MVCC layer and
performing updates on appropriate versions.
Next, for many kinds of updates it is necessary to update the relevant
indexes. There are implicit indexes for vertex labels, as
well as user-created indexes for (label, property) pairs. The accessor
layer takes care of updating the indexes when these values are changed.
Each update also triggers a log statement in the write-ahead log. This
is also handled by the accessor layer.
### Distributed
In distributed Memgraph accessors also contain a lot of the remote graph
element handling logic. More info on that is available in the
documentation for distributed.
### Deferred MVCC data lookup for Edges
Vertices and edges are versioned using MVCC. This means that for each
transaction an MVCC lookup needs to be done to determine which version
is visible to that transaction. This tends to slow things down due to
cache invalidations (version lists and versions are stored in arbitrary
locations on the heap).
However, for edges, only the properties are mutable. The edge endpoints
and type are fixed once the edge is created. For that reason both edge
endpoints and type are available in vertex data, so that when expanding
it is not mandatory to do MVCC lookups of versioned, mutable data. This
logic is implemented in `RecordAccessor` and `EdgeAccessor`.
### Exposure
The original idea and implementation of graph element accessors was that
they'd prevent client code from ever interacting with raw `Vertex` or
`Edge` data. This however turned out to be impractical when implementing
distributed Memgraph and the raw data members have since been exposed
(through getters to old and new version pointers). However, refrain from
working with that data directly whenever possible! Always consider the
accessors to be the first go-to for interacting with data, especially
when in the context of a transaction.
# Skiplist accessor
The term "accessor" is also used in the context of a skiplist. Every
operation on a skiplist must be performed within on an
accessor. The skiplist ensures that there will be no physical deletions
of an object during the lifetime of an accessor. This mechanism is used
to ensure deletion correctness in a highly concurrent container.
We only mention that here to avoid confusion regarding terminology.