memgraph/docs/dev/storage/accessors.md

# DatabaseAccessor

A `DatabaseAccessor` actually wraps a transactional access to database
data, for a single transaction. In that sense the naming is bad. It
encapsulates references to the database and the transaction object.

It contains logic for working with database content (graph element
data) in the context of a single transaction. All CRUD operations are
performed within a single transaction (as Memgraph is a transactional
database), and therefore iteration over data, finding a specific graph
element etc are all functionalities of a `GraphDbAccessor`.

In single-node Memgraph the database accessor also defined the lifetime
of a transaction. Even though a `Transaction` object was owned by the
transactional engine, it was `GraphDbAccessor`'s lifetime that object
was bound to (the transaction was implicitly aborted in
`GraphDbAccessor`'s destructor, if it was not explicitly ended before
that).

# RecordAccessor

It is important to understand data organization and access in the
storage layer. This discussion pertains to vertices and edges as graph
elements that the end client works with.

Memgraph uses MVCC (documented on it's own page). This means that for
each graph element there could be different versions visible to
different currently executing transactions. When we talk about a
`Vertex` or `Edge` as a data structure we typically mean one of those
versions. In code this semantic is implemented so that both those classes
inherit `mvcc::Record`, which in turn inherits `mvcc::Version`.

Handling MVCC and visibility is not in itself trivial. Next to that,
there is other book-keeping to be performed when working with data. For
that reason, Memgraph uses "accessors" to define an API of working with
data in a safe way. Most of the code in Memgraph (for example the
interpretation code) should work with accessors. There is a
`RecordAccessor` as a base class for `VertexAccessor` and
`EdgeAccessor`. Following is an enumeration of their purpose.

### Data access

The client interacts with Memgraph using the Cypher query language. That
language has certain semantics which imply that multiple versions of the
data need to be visible during the execution of a single query. For
example: expansion over the graph is always done over the graph state as
it was at the beginning of the transaction.

The `RecordAccessor` exposes functions to switch between the old and the new
versions of the same graph element (intelligently named `SwitchOld` and
`SwitchNew`) within a single transaction. In that way the client code
(mostly the interpreter) can avoid dealing with the underlying MVCC
version concepts.

### Updates

Data updates are also done through accessors. Meaning: there are methods
on the accessors that modify data, the client code should almost never
interact directly with `Vertex` or `Edge` objects.

The accessor layer takes care of creating version in the MVCC layer and
performing updates on appropriate versions.

Next, for many kinds of updates it is necessary to update the relevant
indexes. There are implicit indexes for vertex labels, as
well as user-created indexes for (label, property) pairs. The accessor
layer takes care of updating the indexes when these values are changed.

Each update also triggers a log statement in the write-ahead log. This
is also handled by the accessor layer.

### Distributed

In distributed Memgraph accessors also contain a lot of the remote graph
element handling logic. More info on that is available in the
documentation for distributed.

### Deferred MVCC data lookup for Edges

Vertices and edges are versioned using MVCC. This means that for each
transaction an MVCC lookup needs to be done to determine which version
is visible to that transaction. This tends to slow things down due to
cache invalidations (version lists and versions are stored in arbitrary
locations on the heap).

However, for edges, only the properties are mutable. The edge endpoints
and type are fixed once the edge is created. For that reason both edge
endpoints and type are available in vertex data, so that when expanding
it is not mandatory to do MVCC lookups of versioned, mutable data. This
logic is implemented in `RecordAccessor` and `EdgeAccessor`.

### Exposure

The original idea and implementation of graph element accessors was that
they'd prevent client code from ever interacting with raw `Vertex` or
`Edge` data. This however turned out to be impractical when implementing
distributed Memgraph and the raw data members have since been exposed
(through getters to old and new version pointers). However, refrain from
working with that data directly whenever possible! Always consider the
accessors to be the first go-to for interacting with data, especially
when in the context of a transaction.

# Skiplist accessor

The term "accessor" is also used in the context of a skiplist. Every
operation on a skiplist must be performed within on an
accessor. The skiplist ensures that there will be no physical deletions
of an object during the lifetime of an accessor. This mechanism is used
to ensure deletion correctness in a highly concurrent container.
We only mention that here to avoid confusion regarding terminology.
Document distributed, durability and storage concepts Summary: Adding all you guys as reviewers. Feel free to take a peek and comment on whatever is not clear. Where needed the documentation can be extended or concepts explained in-person. Reviewers: buda, teon.banek, dgleich, ipaljak, msantl, mculinovic, mtomic Reviewed By: buda Differential Revision: https://phabricator.memgraph.io/D1322 2018-03-27 17:29:37 +08:00			`# DatabaseAccessor`

			A `DatabaseAccessor` actually wraps a transactional access to database
			`data, for a single transaction. In that sense the naming is bad. It`
			`encapsulates references to the database and the transaction object.`

			`It contains logic for working with database content (graph element`
			`data) in the context of a single transaction. All CRUD operations are`
			`performed within a single transaction (as Memgraph is a transactional`
			`database), and therefore iteration over data, finding a specific graph`
			element etc are all functionalities of a `GraphDbAccessor`.

			`In single-node Memgraph the database accessor also defined the lifetime`
			of a transaction. Even though a `Transaction` object was owned by the
			transactional engine, it was `GraphDbAccessor`'s lifetime that object
			`was bound to (the transaction was implicitly aborted in`
			`GraphDbAccessor`'s destructor, if it was not explicitly ended before
			`that).`

			`# RecordAccessor`

			`It is important to understand data organization and access in the`
			`storage layer. This discussion pertains to vertices and edges as graph`
			`elements that the end client works with.`

			`Memgraph uses MVCC (documented on it's own page). This means that for`
			`each graph element there could be different versions visible to`
			`different currently executing transactions. When we talk about a`
			`Vertex` or `Edge` as a data structure we typically mean one of those
			`versions. In code this semantic is implemented so that both those classes`
			inherit `mvcc::Record`, which in turn inherits `mvcc::Version`.

			`Handling MVCC and visibility is not in itself trivial. Next to that,`
			`there is other book-keeping to be performed when working with data. For`
			`that reason, Memgraph uses "accessors" to define an API of working with`
			`data in a safe way. Most of the code in Memgraph (for example the`
			`interpretation code) should work with accessors. There is a`
			`RecordAccessor` as a base class for `VertexAccessor` and
			`EdgeAccessor`. Following is an enumeration of their purpose.

			`### Data access`

			`The client interacts with Memgraph using the Cypher query language. That`
			`language has certain semantics which imply that multiple versions of the`
			`data need to be visible during the execution of a single query. For`
			`example: expansion over the graph is always done over the graph state as`
			`it was at the beginning of the transaction.`

			The `RecordAccessor` exposes functions to switch between the old and the new
			versions of the same graph element (intelligently named `SwitchOld` and
			`SwitchNew`) within a single transaction. In that way the client code
			`(mostly the interpreter) can avoid dealing with the underlying MVCC`
			`version concepts.`

			`### Updates`

			`Data updates are also done through accessors. Meaning: there are methods`
			`on the accessors that modify data, the client code should almost never`
			interact directly with `Vertex` or `Edge` objects.

			`The accessor layer takes care of creating version in the MVCC layer and`
			`performing updates on appropriate versions.`

			`Next, for many kinds of updates it is necessary to update the relevant`
			`indexes. There are implicit indexes for vertex labels, as`
			`well as user-created indexes for (label, property) pairs. The accessor`
			`layer takes care of updating the indexes when these values are changed.`

			`Each update also triggers a log statement in the write-ahead log. This`
			`is also handled by the accessor layer.`

			`### Distributed`

			`In distributed Memgraph accessors also contain a lot of the remote graph`
			`element handling logic. More info on that is available in the`
			`documentation for distributed.`

			`### Deferred MVCC data lookup for Edges`

			`Vertices and edges are versioned using MVCC. This means that for each`
			`transaction an MVCC lookup needs to be done to determine which version`
			`is visible to that transaction. This tends to slow things down due to`
			`cache invalidations (version lists and versions are stored in arbitrary`
			`locations on the heap).`

			`However, for edges, only the properties are mutable. The edge endpoints`
			`and type are fixed once the edge is created. For that reason both edge`
			`endpoints and type are available in vertex data, so that when expanding`
			`it is not mandatory to do MVCC lookups of versioned, mutable data. This`
			logic is implemented in `RecordAccessor` and `EdgeAccessor`.

			`### Exposure`

			`The original idea and implementation of graph element accessors was that`
			they'd prevent client code from ever interacting with raw `Vertex` or
			`Edge` data. This however turned out to be impractical when implementing
			`distributed Memgraph and the raw data members have since been exposed`
			`(through getters to old and new version pointers). However, refrain from`
			`working with that data directly whenever possible! Always consider the`
			`accessors to be the first go-to for interacting with data, especially`
			`when in the context of a transaction.`

			`# Skiplist accessor`

			`The term "accessor" is also used in the context of a skiplist. Every`
			`operation on a skiplist must be performed within on an`
			`accessor. The skiplist ensures that there will be no physical deletions`
			`of an object during the lifetime of an accessor. This mechanism is used`
			`to ensure deletion correctness in a highly concurrent container.`
			`We only mention that here to avoid confusion regarding terminology.`