Reviewers: buda, msantl, ipaljak Reviewed By: ipaljak Subscribers: teon.banek, pullbot Differential Revision: https://phabricator.memgraph.io/D1476
5.6 KiB
Property storage
Although the reader is probably familiar with properties in Memgraph, let's briefly recap.
Both vertices and edges can store an arbitrary number of properties. Properties are, in essence, ordered pairs of property names and property values. Each property name within a single graph element (edge/node) can store a single property value. Property names are represented as strings, while property values must be one of the following types:
Type | Description |
---|---|
Null |
Denotes that the property has no value. This is the same as if the property does not exist. |
String |
A character string, i.e. text. |
Boolean |
A boolean value, either true or false . |
Integer |
An integer number. |
Float |
A floating-point number, i.e. a real number. |
List |
A list containing any number of property values of any supported type. It can be used to store multiple values under a single property name. |
Map |
A mapping of string keys to values of any supported type. |
Property values are modeled in a class conveniently called PropertyValue
.
Mapping between property names and property keys.
Although users think of property names in terms of descriptive strings (e.g. "location" or "department"), Memgraph internally converts those names into property keys which are, essentially, unsigned 16-bit integers.
Property keys are modelled by a not-so-conveniently named class called
Property
which can be found in storage/types.hpp
. The actual conversion
between property names and property keys is done within the ConcurrentIdMapper
but the internals of that implementation are out of scope for understanding
property storage.
PropertyValueStore
Both Edge
and Vertex
objects contain an instance of PropertyValueStore
object which is responsible for storing properties of a corresponding graph
element.
An interface of PropertyValueStore
is as follows:
Method | Description |
---|---|
at |
Returns the PropertyValue for a given Property (key). |
set |
Stores a given PropertyValue under a given Property (key). |
erase |
Deletes a given Property (key) alongside its corresponding PropertyValue . |
clear |
Clears the storage. |
iterator |
Provides an extension of std::input_iterator that iterates over storage. |
Storage location
By default, Memgraph is an in-memory database and all properties are therefore stored in working memory unless specified otherwise by the user. User has an option to specify via the command line which properties they wish to be stored on disk.
Storage location of each property is encapsulated within a Property
object
which is ensured by the ConcurrentIdMapper
. More precisely, the unsigned 16-bit
property key has the following format:
|---location--|------id------|
|-Memory|Disk-|-----2^15-----|
In other words, the most significant bit determines the location where the property will be stored.
In-memory storage
The underlying implementation of in-memory storage for the time being is
std::vector<std::pair<Property, PropertyValue>>
. Implementations ofat
, set
and erase
are linear in time. This implementation is arguably more efficient
than std::map
or std::unordered_map
when the average number of properties of
a record is relatively small (up to 10) which seems to be the case.
On-disk storage
KVStore
Disk storage is modeled by an abstraction of key-value storage as implemented in `storage/kvstore.hpp'. An interface of this abstraction is as follows:
Method | Description |
---|---|
Put |
Stores the given value under the given key. |
Get |
Obtains the given value stored under the given key. |
Delete |
Deletes a given (key, value) pair from storage.. |
DeletePrefix |
Deletes all (key, value) pairs where key begins with a given prefix. |
Size |
Returns the size of the storage or, optionally, the number of stored pairs that begin with a given prefix. |
iterator |
Provides an extension of std::input_iterator that iterates over storage. |
Keys and values in this context are of type std::string
.
The actual underlying implementation of this abstraction uses [RocksDB]{https://rocksdb.org} — a persistent key-value store for fast storage.
It is worthy to note that the custom iterator implementation allows the user to iterate over a given prefix. Otherwise, the implementation follows familiar c++ constructs and can be used as follows:
KVStore storage = ...;
for (auto it = storage.begin(); it != storage.end(); ++it) {}
for (auto kv : storage) {}
for (auto it = storage.begin("prefix"); it != storage.end("prefix"); ++it) {}
Note that it is not possible to scan over multiple prefixes. For instance, one might assume that you can scan over all keys that fall in a certain lexicographical range. Unfortunately, that is not the case and running the following code will result in an infinite loop with a touch of undefined behavior.
KVStore storage = ...;
for (auto it = storage.begin("alpha"); it != storage.end("omega"); ++it) {}
Data organization on disk
Each PropertyValueStore
instance can access a static KVStore
object that can
store (key, value)
pairs on disk. The key of each property on disk consists of
two parts — a unique identifier (unsigned 64-bit integer) of the current
record version (see mvcc docummentation for further clarification) and a
property key as described above. The actual value of the property is serialized
into a bytestring using bolt BaseEncoder
. Similarly, deserialization is
performed by bolt Decoder
.