Summary: Depends on D2090 Reviewers: mtomic, teon.banek Reviewed By: teon.banek Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D2091
45 KiB
Lisp C++ Preprocessor (LCP)
In our development process we are using Common Lisp to generate some parts of
the C++ codebase. The idea behind this is supplementing C++ with better
meta-programming capabilities to automate tasks and prevent bugs due to code
duplication. Primary candidate for using more powerful meta-programming is
generating serialization code. Such code is almost always the same: go through
all struct
or class
members and invoke the serialization function on them.
Writing such code manually is error prone when adding members, because you may
easily forget to correctly update the serialization code. Thus, the Lisp C++
Preprocessor was born. It is hooked in our build process as a step before
compilation. The remainder of the document describes how to use LCP and its
features.
Contents
Running LCP
You can generate C++ from an LCP file by running the following command.
./tools/lcp <path-to-file.lcp>
The LCP will produce a path-to-file.hpp
file and potentially a
path-to-file.lcp.cpp
file. The .cpp
file is generated if some parts of the
code need to be in the implementation file. This is usually the case when
generating serialization code. Note that the .cpp
file has the extension
appended to .lcp
, so that you are free to define your own path-to-file.cpp
which includes the generated path-to-file.hpp
.
One serialization format uses Cap'n Proto library, but to use it, you need to
provide an ID. The ID is generated by invoking capnp id
. When you want to
generate Cap'n Proto serialization, you need to pass the generated ID to LCP.
./tools/lcp <path-to-file.lcp> $(capnp id)
Generating Cap'n Proto serialization will produce an additional file,
path-to-file.capnp
, which contains the serialization schema.
You may wonder why the LCP doesn't invoke capnp id
itself. Unfortunately,
such behaviour would be wrong when running LCP on the same file multiple
times. Each run would produce a different ID and the serialization code would
be incompatible between versions.
CMake
The LCP is run in CMake using the add_lcp
function defined in
CMakeLists.txt
. You can take a look at the function documentation there for
information on how to add your new LCP files to the build system.
Writing LCP
A LCP file should have the .lcp
extension, but the code written is
completely valid Common Lisp code. This means that you have a complete
language at your disposal before even the C++ is compiled. You can view this
as similar to the C++ templates and macros, but they do not have access to
a complete language.
Besides Common Lisp, you are allowed to write C++ code verbatim. This means that C++ and Lisp code coexist in the file. How to do that, as well as other features are described below.
Inlining C++ in Common Lisp
To insert C++ code, you need to use a #>cpp ... cpp<#
block. This is most
often used at the top of the file to write Doxygen documentation and put some
includes. For example:
#>cpp
/// @file My Doxygen style documentation about this file
#pragma once
#include <vector>
cpp<#
The above code will be pasted as is into the generated header file. If you
wish to have a C++ block in the .cpp
implementation file instead, you should
use lcp:in-impl
function. For example:
(lcp:in-impl
#>cpp
void MyClass::Method(int awesome_number) {
// Do something with awesome_number
}
cpp<#)
The C++ block also supports string interpolation with a syntax akin to shell
variable access, ${lisp-variable}
. At the moment, only variables are
supported and they have to be pretty printable in Common Lisp (i.e. support
the ~A
format directive). For example, we can make a precomputed sinus
function for integers from 0 to 5:
(let ((sin-from-0-to-5
(format nil "~{~A~^, ~}" (loop for i from 0 below 5 collect (sin i)))))
#>cpp
static const double kSinFrom0To5[] = {${sin-from-0-to-5}};
cpp<#)
The following will be generated.
static const double kSinFrom0To5[] = {0.0, 0.84147096, 0.9092974, 0.14112, -0.7568025};
Since you have a complete language at your disposal, this is a powerful tool to generate tables for computations which would take a very long time during the execution of the C++ program.
C++ Namespaces
Although you can use inline C++ to open and close namespaces, it is
recommended to use lcp:namespace
and lcp:pop-namespace
functions. LCP will
report an error if you have an unclosed namespace, unlike Clang and GCC which
most of the times give strange errors due to C++ grammar ambiguity. Additional
benefit is that LCP will track the namespace stack and correctly wrap any C++
code which should be put in the .cpp
file.
For example:
;; example.lcp
(lcp:namespace utils)
;; Function declaration in header
#>cpp
bool StartsWith(const std::string &string, const std::string &prefix);
cpp<#
;; Function implementation in implementation file
(lcp:in-impl
#>cpp
bool StartsWith(const std::string &string, const std::string &prefix) {
// Implementation code
return false;
}
cpp<#)
(lcp:pop-namespace) ;; utils
The above will produce 2 files, header and implementation:
// example.hpp
namespace utils {
bool StartsWith(const std::string &string, const std::string &prefix);
}
// example.lcp.cpp
namespace utils {
bool StartsWith(const std::string &string, const std::string &prefix) {
// Implementation code
return false;
}
}
C++ Enumerations
LCP provides a lcp:define-enum
macro to define a C++ enum class
type. This
will make LCP aware of the type and all its possible values. This makes it
possible to generate the serialization code. In the future, LCP may generate
"string to enum" and "enum to string" functions.
Example:
(lcp:define-enum days-in-week
(monday tuesday wednesday thursday friday saturday sunday)
;; Optional documentation
(:documentation "Enumerates days of the week")
;; Optional directive to generate serialization code
(:serialize))
Produces:
/// Enumerates days of the week
enum class DaysInWeek {
MONDAY,
TUESDAY,
WEDNESDAY,
THURSDAY,
FRIDAY,
SATURDAY,
SUNDAY
};
// serialization code ...
C++ Classes & Structs
For defining C++ classes, there is a lcp:define-class
macro. Its counterpart
for structures is lcp:define-struct
. They are exactly the same, but
lcp:define-struct
will put members in public scope by default. Just like in
C++.
Defining classes is a bit more involved, because they have many customization
options. They syntax follows the syntax of class definition in Common Lisp
(see defclass
).
Basic example:
(lcp:define-class my-class ()
((primitive-value :int64_t)
(stl-vector "std::vector<int>"))
;; Optional documentation
(:documentation "My class documentation")
;; Define explicitly public, protected or private code. All are optional.
(:public #>cpp // some public code, e.g. methods cpp<#)
(:protected #>cpp // protected cpp<#)
(:private #>cpp //private cpp<#))
The above will generate:
/// My class documentation
class MyClass {
public:
// some public code, e.g. methods
protected:
// protected
private:
// private
int64_t primitive_value_;
std::vector<int> stl_vector_;
};
As you can see, members in LCP are followed by a type. For primitive types, a
Lisp keyword is used. E.g. :int64_t
, :bool
, etc. Other types, like STL
containers use a valid C++ string to specify type.
C++ supports nesting types inside a class. You can do the same in LCP inside any of the scoped additions.
For example:
(lcp:define-class my-class ()
((member "NestedType")
(value "NestedEnum"))
(:private
(lcp:define-enum nested-enum (first-value second-value))
(lcp:define-class nested-type ()
((member :int64_t)))
#>cpp
// Some other C++ code
cpp<#))
The above should produce expected results.
You can add a base classes after the class name. The name should be a Lisp
symbol for base classes defined through lcp:define-class
, so that LCP tracks
the inheritance. Otherwise, it should be a string.
For example:
(lcp:define-class derived (my-class "UnknownInterface")
())
Will generate:
class Derived : public MyClass, public UnknownInterface {
};
Similarly, you can specify template parameters. Instead of giving just a name
to define-class
, you give a list where the first element is the name of the
class, while others name the template parameters.
(lcp:define-class (my-map t-key t-value) ()
((underlying-map "std::unordered_map<TKey, TValue>")))
The above will generate:
template <class TKey, class TValue>
class MyMap {
private:
std::unordered_map<TKey, TValue> underlying_map_;
};
Other than tweaking the class definition, you can also do additional configuration of members. The following options are supported.
:initval
-- sets the initial value of a member:reader
-- generates a public getter:scope
-- set the scope of a member, one of:public
,:private
or:protected
:documentation
-- Doxygen documentation of a member- various serialization options which are explained later
For example:
(lcp:define-class my-class ()
((member "std::vector<int>" :scope :protected :initval "1, 2, 3" :reader t
:documentation "Member documentation")))
Will generate:
class MyClass {
public:
const auto &member() { return member_; }
protected:
/// Member documentation
std::vector<int> member_{1, 2, 3};
};
TypeInfo
Defining a class or struct will also generate appropriate utils::TypeInfo
instance. This instance will be stored as the static const
member named
kType
and the class will also have a GetTypeInfo
public member function,
for getting TypeInfo
on an instance. If a class is part of an inheritance
hierarchy, then the GetTypeInfo
will be virtual
.
This mechanism is a portable solution for Run Time Type Information in C++.
Take a look at the documentation of TypeInfo
in the codebase for additional
information on what is provided with it.
For example, defining a derived class:
(lcp:define-class derived-class (base-class)
((member :int64_t)))
Will generate:
class DerivedClass : public BaseClass {
public:
static const utils::TypeInfo kType;
const utils::TypeInfo &GetTypeInfo() const override { return kType; }
private:
int64_t member_;
};
The generated TypeInfo
object does not support multiple inheritance, and LCP
will report an error if it tries to generate such information. Additionally,
generating type information for template classes is not supported. In such a
case you will need to write your own description by hand. There are also cases
where you may want to tune the generation of type information. These cases
are described below.
Sometimes, you may want to avoid having a virtual or overridden call by
treating a derived type as if it is a base type for the purposes of
TypeInfo
. This can be done by passing the :type-info :base t
class option.
NOTE: This will potentially break some TypeInfo
related functions, like
IsSubtype
. Therefore, this should only be used if the base type does not
have a correctly set TypeInfo
interface.
The same example, but with the option enabled:
(lcp:define-class derived-class (base-class)
((member :int64_t))
(:type-info :base t))
Will generate:
class DerivedClass : public BaseClass {
public:
static const utils::TypeInfo kType;
// Has no override because we pretend this is "base".
// Is not virtual because LCP didn't detect any class inheriting this one.
const utils::TypeInfo &GetTypeInfo() const { return kType; }
private:
int64_t member_;
};
Similarly to :base t
, when you have multiple inheritance but only want to
track the primary class as super class, you can use the
:ignore-other-base-classes t
option. NOTE: Similarly to the :base t
option, this should only be used if you know it won't break the use cases of
TypeInfo
for this class hierarchy.
Again, the same example, but with :ignore-other-base-classes t
:
(lcp:define-class derived-class (base-class other-base-class)
((member :int64_t))
(:type-info :ignore-other-base-classes t))
Will generate:
class DerivedClass : public BaseClass, public OtherBaseClass {
public:
static const utils::TypeInfo kType;
// Overrides, because it's expected BaseClass has one. TypeInfo is set to
// track only BaseClass as super class.
const utils::TypeInfo &GetTypeInfo() const override { return kType; }
private:
int64_t member_;
};
Defining an RPC
In our codebase, we have implemented remote procedure calls. These are used
for communication between Memgraph instances in a distributed system. Each RPC
is registered by its type and requires serializable data structures. Writing
RPC compliant structure requires a lot of boilerplate. To ease the pain of
defining a new RPC we have a macro, lcp:define-rpc
.
Definition consists of 2 parts: request and response. You can specify members
of each part. Member definition is the same as in lcp:define-class
.
For example:
(lcp:define-rpc query-result
(:request
((tx-id "tx::TransactionId")
(query-id :int64_t)))
(:response
((values "std::vector<int>"))))
The above will generate relatively large amount of C++ code, which is omitted here as the details aren't important for understanding the use. Examining the generated code is left as an exercise for the reader.
The important detail is that in C++ you will have a QueryResultRpc
structure, which is used to register the behaviour of an RPC server. You need
to perform the registration manually. For example:
// somewhere in code you have a server instance
rpc_server.Register<QueryResultRpc>(
[](const auto &req_reader, auto *res_builder) {
QueryResultReq request;
Load(&request, req_reader);
// process the request and send the response
QueryResultRes response(values_for_response);
Save(response, res_builder);
});
// somewhere else you have a client which sends the RPC
tx::TransactionId tx_id = ...
int64_t query_id = ...
auto response = rpc_client.template Call<QueryResultRpc>(tx_id, query_id);
if (response) {
const auto &values = response->getValues();
// do something with values
}
RPC structures use Cap'n Proto for serialization. The above variables
req_reader
and res_builder
are used to access Cap'n Proto structures.
Obviously, the LCP will generate the Cap'n Proto schema alongside the C++
code for serialization.
Cap'n Proto Serialization
Primary purpose of LCP was to make serialization of types easier. Our serialization library of choice for C++ is Cap'n Proto. LCP provides generation and tuning of its serialization code. Previously, LCP supported Boost.Serialization, but it was removed.
To specify a class or structure for serialization, you may pass a
:serialize (:capnp)
option when defining such type. (Note that
lcp:define-enum
takes :serialize
without any arguments).
For example:
(lcp:define-struct my-struct ()
((member :int64_t))
(:serialize (:capnp)))
:serialize
option will generate a Cap'n Proto schema of the class and store
it in the .capnp
file. C++ code will be generated for saving and loading
members:
// Top level functions
void Save(const MyStruct &self, capnp::MyStruct::Builder *builder);
void Load(MyStruct *self, const capnp::MyStruct::Reader &reader);
Since we use top level functions, the class needs to have some sort of public access to its members.
The schema file will be namespaced in capnp
. To change add a prefix
namespace use lcp:capnp-namespace
function. For example, if we use
(lcp:capnp-namespace "my_namespace")
then the reader and builder would be in
my_namespace::capnp
.
Serializing a class hierarchy is also supported. The most basic case with single inheritance works out of the box. Handling other cases is explained in later sections.
For example:
(lcp:define-class base ()
((base-member "std::vector<int64_t>" :scope :public))
(:serialize (:capnp)))
(lcp:define-class derived (base)
((derived-member :bool :scope :public))
(:serialize (:capnp)))
Note that all classes need to have the :serialize
option set. Signatures of
Save
and Load
functions are changed to accept reader and builder to the
base class. The Load
function now takes a std::unique_ptr<T> *
which is
used to take ownership of a concrete type. This approach transfers the
responsibility of type allocation and construction from the user of Load
to
Load
itself.
void Save(const Derived &self, capnp::Base::Builder *builder);
void Load(std::unique_ptr<Base> *self, const capnp::Base::Reader &reader);
Multiple Inheritance
Cap'n Proto does not support any form of inheritance, instead we are handling it manually. Single inheritance was relatively easy to add to Cap'n Proto, we simply enumerate all derived types inside the union of a base type.
Multiple inheritance is a different beast and as such is not directly supported.
One way to use multiple inheritance is only to implement the interface of pure
virtual classes without any members (i.e. interface classes). In such a case,
you do not want to serialize any other base class except the primary one. To
let LCP know that is the case, use :ignore-other-base-classes t
. LCP will
only try to serialize the base class that is the first (leftmost) in the list
of super classes.
(lcp:define-class derived (primary-base some-interface other-interface)
...
(:serialize (:capnp :ignore-other-base-classes t)))
Another form of multiple inheritance is reusing some common code. In
actuality, this is a very bad code practice and should be replaced with
composition. If it would take too long to fix such code to use composition
proper, we can tell LCP to treat such inheritance as if they are indeed
composed. This is done via :inherit-compose
option.
For example:
(lcp:define-class derived (first-base second-base)
...
(:serialize (:capnp :inherit-compose '(second-base))))
With :inherit-compose
you can pass a list of parent classes which should be
encoded as composition inside the Cap'n Proto schema. LCP will complain if
there is multiple inheritance but you didn't specify :inherit-compose
.
The downside of this approach is that Save
and Load
will work only on
FirstBase
. Serializing a pointer to SecondBase
would be incorrect.
Inheriting C++ Class Outside of LCP
Classes defined outside of lcp:define-class
are not visible to LCP and LCP
will not be able to generate correct serialization code.
The cases so far have been only with classes that are pure interface and need
no serialization code. This is signaled to LCP by passing the option :base t
to :capnp
. LCP will treat such classes as actually being the base class of a
hierarchy.
For example:
(lcp:define-class my-class ("utils::TotalOrdering")
(...)
(:serialize (:capnp :base t)))
(lcp:define-class derived (my-class)
(...)
(:serialize (:capnp)))
Only the base class for serialization has the :base t
option set. Derived
classes are defined as usual. This relies on the fact that we do not expect
anyone to have a pointer to utils::TotalOrdering
and use it for
serialization and deserialization.
Template Classes
Currently, LCP supports the most primitive form of serializing templated classes. The template arguments must be provided to specify an explicit instantiation. Cap'n Proto does support generics, so we may want to upgrade LCP to use them in the future.
To specify template arguments, pass a :type-args
option. For example:
(lcp:define-class (my-container t-value) ()
(...)
(:serialize (:capnp :type-args '(my-class))))
The above will support serialization of MyContainer<MyClass>
type.
The syntax will work even if our templated class inherits from non-templated classes. All other cases of inheritance with templates are forbidden in LCP serialization.
Cap'n Proto Schemas and Type Conversions
You can import other serialization schemas by using lcp:capnp-import
function. It expects a name for the import and the path to the schema file.
For example, to import everything from utils/serialization.capnp
under the
name Utils
, you can do the following:
(lcp:capnp-import 'utils "/utils/serialization.capnp")
To use those types, you need to register a conversion from C++ type to schema
type. There are two options, registering a whole file conversion with
lcp:capnp-type-conversion
or converting a specific class member.
For example, you have a class with member of type Bound
and there is a
schema for it also named Bound
inside the imported schema.
You can use lcp:capnp-type-conversion
like so:
(lcp:capnp-type-conversion "Bound" "Utils.Bound")
(lcp:define-class my-class ()
((my-bound "Bound")))
Specifying only a member conversion can be done with :capnp-type
member
option:
(lcp:define-class my-class ()
((my-bound "Bound" :capnp-type "Utils.Bound")))
Custom Save and Load Hooks
Sometimes the default serialization is not adequate and you may wish to
provide your own serialization code. For those reasons, LCP provides
:capnp-save
, :capnp-load
and :capnp-init
options on each class member.
The simplest is :capnp-init
which when set to nil
will not generate an
init<member>
call on a builder. Cap'n Proto requires that compound types are
initialized before beginning to serialize its members. :capnp-init
allows you
to delay the initialization to your custom save code. You rarely want to set
:capnp-init nil
.
Custom save code is added as a value of :capnp-save
. It should be a function
which takes 3 arguments.
- Name of builder variable.
- Name of the class (or struct) member.
- Name of the member in Cap'n Proto schema.
The result of the function needs to be a C++ code block.
You will rarely need to use the 3rd argument, so it should be ignored in most
cases. It is usually needed when you set :capnp-init nil
, so that you can
correctly initialize the builder.
Similarly, :capnp-load
expects a function taking a reader, C++ member and
Cap'n Proto member, then returns a C++ block.
Example:
(lcp:define-class my-class ()
((my-member "ComplexType"
:capnp-init nil
:capnp-save (lambda (builder member capnp-name)
#>cpp
auto data = ${member}.GetSaveData();
auto my_builder = ${builder}.init${capnp-name}();
my_builder.setData(data);
cpp<#)
:capnp-load (lambda (reader member capnp-name)
(declare (ignore capnp-name))
#>cpp
auto data = ${reader}.getData();
${member}.LoadFromData(data);
cpp<#)))
(:serialize (:capnp)))
With custom serialization code, you may want to get additional details through
extra arguments to Save
and Load
functions. This is described in the next
section.
There are also cases where you always need custom serialization code. LCP provides helper functions for abstracting some common details. These functions are listed further down in this document.
Arguments for Save and Load
Default arguments for Save
and Load
function are Cap'n Proto builder and
reader, respectively. In some cases you may wish to send additional arguments.
This is most commonly needed when tracking shared_ptr
serialization, to
avoid serializing the same pointer multiple times.
Additional arguments are specified by passing :save-args
and :load-args
.
You can specify either of them, but in most cases you want both.
For example:
;; Class for tracking details during save
(lcp:define-class save-helper ()
(...))
;; Class for tracking details during load
(lcp:define-class load-helper ()
(...))
(lcp:define-class my-class ()
((member "std::shared_ptr<int>"
:capnp-save ;; custom save
:capnp-load ;; custom load
))
(:serialize (:capnp
:save-args '((save-helper "SaveHelper *"))
:load-args '((load-helper "LoadHelper *")))))
The custom serialization code will now have access to save_helper
and
load_helper
variables in C++. You can add more arguments by expanding the
list of pairs, e.g.
:save-args '((first-helper "SomeType *") (second-helper "OtherType *") ...)
Custom Serialization Helper Functions
Helper for std::optional
When using std::optional
with primitive C++ types or custom types known to
LCP, you do not need to use any helper. In the example below, things should be
serialized as expected:
(lcp:define-class my-class-with-primitive-optional ()
((primitive-optional "std::experimental::optional<int64_t>"))
(:serialize (:capnp)))
(lcp:define-class my-class-with-known-type-optional ()
((known-type-optional "std::experimental::optional<MyClassWithPrimitiveOptional>"))
(:serialize (:capnp)))
In cases when the value contained in std::optional
needs custom
serialization code you may use lcp:capnp-save-optional
and
lcp:capnp-load-optional
.
Both functions expect 3 arguments.
- Cap'n Proto type in C++.
- C++ type of the value inside
std::optional
. - Optional C++ lambda code.
The lambda code is optional, because LCP will generate the default
serialization code which invokes Save
and Load
function on the value
stored inside the optional. Since most of the serialized classes follow the
convention, you will rarely need to provide this 3rd argument.
For example:
(lcp:define-class my-class ()
((member "std::experimental::optional<SomeType>"
:capnp-save (lcp:capnp-save-optional
"capnp::SomeType" "SomeType"
"[](auto *builder, const auto &val) { ... }")
:capnp-load (lcp:capnp-load-optional
"capnp:::SomeType" "SomeType"
"[](const auto &reader) { ... return loaded_val; }"))))
Helper for std::vector
For custom serialization of vector elements, you may use
lcp:capnp-save-vector
and lcp:capnp-load-vector
. They function exactly the
same as helpers for std::optional
.
Helper for enumerations
If the enumeration is defined via lcp:define-enum
, the default LCP
serialization should generate the correct code.
However, if LCP cannot infer the serialization code, you can use helper
functions lcp:capnp-save-enum
and lcp:capnp-load-enum
. Both functions
require 3 arguments.
- C++ type of equivalent Cap'n Proto enum.
- Original C++ enum type.
- List of enumeration values.
Example:
(lcp:define-class my-class ()
((enum-value "SomeEnum"
:capnp-init nil ;; must be set to nil
:capnp-save (lcp:capnp-save-enum
"capnp::SomeEnum" "SomeEnum"
'(first-value second-value))
:capnp-load (lcp:capnp-load-enum
"capnp::SomeEnum" "SomeEnum"
'(first-value second-value)))))
SaveLoadKit Serialization
LCP supports generating serialization code for use with our own simple serialization framework, SaveLoadKit (SLK).
To specify a class or structure for serialization, pass a :serialize (:slk)
class option. For example:
(lcp:define-struct my-struct ()
((member :int64_t))
(:serialize (:slk)))
The above will generate C++ functions for saving and loading all members of
the defined type. The generated code is put inside the slk
namespace. For
the above example, we would get the following declarations:
namespace slk {
void Save(const MyStruct &self, slk::Builder *builder);
void Load(MyStruct *self, slk::Reader *reader);
}
Since we use top level (i.e. non-member) functions, the class members need to have public access. The primary reason why we use non-member functions is the ability to have them decoupled from types. This in turn allows us to easily compile the code with and without serialization. The obvious downside is the requirement of public access which could potentially allow for erroneous use of classes and its members. Therefore, the recommended way to use serialization is with plain old data types. The programmer needs be aware of that and use POD as an immutable type as much as possible. This recommendation of using POD types will also help minimize the complexity of serialization code as well as minimize required features in LCP.
Another requirement on serialized types is that they need to be default
constructible. This keeps the serialization implementation simple and uniform.
Each type is first default constructed, potentially on stack memory. Then the
slk::Load
function is invoked with the pointer to that instance. We could
add support for having a pointer to an uninitialized memory and perform the
construct in slk::Load
to allow types which aren't default constructible.
At the moment, implementing this support would needlessly complicate our code
where most of the types can be and are default constructible.
Single Inheritance
The first and most common step out of the POD zone is having classes with inheritance. LCP supports serializing classes with single inheritance.
A minor complication appears when loading a pointer to a base class. When we
have a pointer to a base class, serializing it may save the data of some
concrete, derived type. Loading the pointer back will need to determine which
type was actually serialized. When we know the concrete type, we need to
construct it and load it. Finally, we can return a base pointer to that. For
this reason, we generate 2 loading functions: regular Load
and
ConstructAndLoad
. The latter function is used to do the whole process of
determining the type, constructing it and invoking regular Load
. Since we
cannot know the type of the serialized pointer upfront, we cannot allocate the
exact required memory on the stack. For that reason, ConstructAndLoad
will
perform a heap allocation for you. Obviously, this could be a performance
issue. In cases when we know the exact concrete type, then we can use the
regular Load
which expects the pointer to that type. If you are using Load
instead of ConstructAndLoad
, read the next paragraph carefully!
Determining which type was serialized works by storing the id
of
utils::TypeInfo
when saving a class which is anywhere in the inheritance
hierarchy. This is the first thing the invocation to Save
does. Later,
when we call ConstructAndLoad
it will read that type id
and dispatch on it
to construct the instance of that type and call the appropriate Load
function. Beware when invoking Load
of polymorphic types yourself! You
need to read the type id
yourself first and then invoke the Load
function. Things will not work correctly if you forget to do that, because
Load
expects to read the serialized data members and not the type
information.
For example:
(lcp:define-class base ()
...
(:serialize (:slk)))
(lcp:define-class derived (base)
...
(:serialize (:slk)))
We get the following declarations generated:
namespace slk {
// Save will correctly forward to derived class using `dynamic_cast`!
void Save(const Base &self, slk::Builder *builder);
// Load only the Base instance, does *not* forward!
void Load(Base *self, slk::Reader *reader);
// Construct the concrete type (could be Base or any derived) and call the
// correct Load. Raises `slk::SlkDecodeException` if an unknown type is
// serialized.
void ConstructAndLoad(std::unique_ptr<Base> *self, slk::Reader *reader);
void Save(const Derived &self, slk::Builder *builder);
void Load(Derived *self, slk::Reader *reader);
// This will raise slk::SlkDecodeException, if something other than `Derived`
// was serialized. `Derived` does not have any subclassses.
void ConstructAndLoad(std::unique_ptr<Derived> *self, slk::Reader *reader);
Multiple Inheritance
Serializing classes with multiple inheritance is not supported!
Usually, multiple inheritance is used to satisfy some interface which doesn't
carry data for serialization. In such cases, you can ignore the multiple
inheritance by specifying :ignore-other-base-classes
option. For example:
(lcp:define-class derived (primary-base some-interface ...)
...
(:serialize (:slk :ignore-other-base-classes t)))
The above will produce serialization code as if derived
is inheriting only
from primary-base
.
Templated Types
Serializing templated types is not supported!
You may still write your own serialization code in C++, but LCP will not generate it for you.
Custom Save and Load Hooks
In cases when default serialization is not adequate, you may wish to provide
your own serialization code. LCP provides :slk-save
and :slk-load
options
for each member.
These hooks for custom serialization expect a function with a single argument,
member
, representing the member currently being serialized. This allows to
have a more generic function which works with any member of some type. The
return value of the function needs to be C++ code. The generated code may
expect to have self
and builder
variables in scope, just like they are
found in the generated Save
and Load
declarations.
For example, one of the most common use cases is saving and loading
a std::shared_ptr
. You need to provide an argument which is used to track
which pointers were already (de)serialized. Let's take a look how this could
be done in LCP.
(lcp:define-struct my-struct ()
((some-ptr "std::shared_ptr<SomeType>"
:slk-save (lambda (member)
#>cpp
std::vector<SomeType *> already_saved;
slk::Save(self.${member}, builder, &already_saved);
cpp<#)
:slk-load (lambda (member)
#>cpp
std::vector<std::shared_ptr<SomeType>> already_loaded;
slk::Load(&self->${member}, reader, &already_loaded);
cpp<#)))
(:serialize (:slk)))
The above use is very artificial, because we usually have multiple shared pointers across different members. In such cases we would like to share the tracking data. One way to do that is explained in the next section.
Additional Arguments to Generated Save and Load
As you may have noticed, primary arguments for Save
and Load
are the type
instance and a slk::Builder
or a slk::Reader
. In some cases we would like
to accept additional arguments to help us with the serialization process.
Let's see how this is done in LCP using the :save-args
and :load-args
options for :slk
serialization.
Both :save-args
and :load-args
options expect a list of pairs. Each pair
designates one argument. The first element of the pair is the argument name
and the second is the C++ type of that argument.
As mentioned in the previous section, one of the most common cases where
default serialization doesn't cut it is when we have a std::shared_ptr
.
Here, we would like to track already serialized pointers. Instead of having
some kind of a global variable, we could pass the tracking data as an
additional argument. Let's take the example from the previous section, and
have it take tracking data as an argument to Save
and Load
of my-struct
type.
(lcp:define-struct my-struct ()
((some-ptr "std::shared_ptr<SomeType>"
:slk-save (lambda (member)
#>cpp
slk::Save(self.${member}, builder, already_saved);
cpp<#)
:slk-load (lambda (member)
#>cpp
slk::Load(&self->${member}, reader, already_loaded);
cpp<#)))
(:serialize (:slk :save-args '((already-saved "std::vector<SomeType *> *"))
:load-args '((already-loaded "std::vector<std::shared_ptr<SomeType>> *")))))
The generated declarations now look like the following:
void Save(const MyStruct &self, slk::Builder *builder,
std::vector<SomeType *> *already_saved);
void Load(MyStruct *self, slk::Builder *builder,
std::vector<std::shared_ptr<SomeType>> *already_loaded);
This can now be handy when serializing multiple instances of my-struct
. For
example:
(lcp:define-struct my-array-of-struct ()
((structs "std::vector<MyStruct>"
:slk-save (lambda (member)
#>cpp
slk::Save(self.${member}.size(), builder);
std::vector<SomeType *> already_saved;
for (const auto &my_struct : structs)
slk::Save(my_struct, builder, &already_saved);
cpp<#)
:slk-load (lambda (member)
#>cpp
size_t size = 0;
slk::Load(&size, reader);
self->${member}.resize(size);
std::vector<std::shared_ptr<SomeType>> already_loaded;
for (size_t i = 0; i < size; ++i)
slk::Load(&self->${member}[i], reader, &already_loaded);
cpp<#)))
(:serialize (:slk)))
Object Cloning
LCP supports automatic generation of cloning (deep copy) code for user-defined classes.
A textbook example of an object that would require a deep copy functionality is a tree structure. The following class represents a node in the binary tree, carrying an integer value and having pointers to its two children:
(lcp:define-class node ()
((value :int32_t)
(left "std::unique_ptr<Node>")
(right "std::unique_ptr<Node>"))
(:clone :return-type (lambda (typename)
#>cpp
std::unique_ptr<${typename}>
cpp<#)
:init-object (lambda (var typename)
#>cpp
auto ${var} = std::make_unique<${typename}>();
cpp<#)))
The above will generate the following C++ class with a Clone
function that
can be used for making a deep copy of the binary tree structure:
class Node {
public:
std::unique_ptr<Node> Clone() const {
auto object = std::make_unique<Node>();
object->value_ = value_;
object->left_ = left_ ? left_->Clone() : nullptr;
object->right_ = right_ ? right_->Clone() : nullptr;
return object;
}
private:
int32_t value_;
std::unique_ptr<Node> left_;
std::unique_ptr<Node> right_;
};
To specify that a class is deep copyable, :clone
class option must be passed.
We have already seen two options that :clone
accepts: :return-type
and
:init-object
.
:return-type
expects a function that takes a single argument which is the C++
type name of the class and produces C++ code, which is a valid C++ type
declaration. Here we used it to specify that Clone
function should return a
std::unique_ptr
to the newly created Node
to override the default behavior.
When :return-type
option is not provided and class T
is a member of an
inheritance hierarchy, Clone
will return std::unique_ptr<Base>
, where
Base
is the root of that hierarchy. If T
is not a member of inheritance
hierarchy, Clone
will return T
by default.
:init-object
expects a function that takes two arguments, first is a variable
name, and the second one is the C++ type name of the class. It must produce C++
code that initializes an object with the given name of the same type that
Clone
function returns. Here we had to use it since we are overriding the
default return value of Clone
. Unless :init-object
argument is provided, an
object of type T
will be instantiated with auto object = std::make_unique<T>();
if T
is a member of inheritance hierarchy, and T object;
if it is not. As you can see, deep copyable objects must be default
constructible.
Single Inheritance
LCP supports deep copying of classes with single inheritance. The root class
will have a virtual Clone
function that child classes will override. For
example:
(lcp:define-class base ()
((member :int32_t))
(:clone))
(lcp:define-class derived (base)
((another-member :int32_t))
(:clone))
We get the following code:
class Base {
public:
virtual std::unique_ptr<Base> Clone() const {
auto object = std::make_unique<Base>();
object->member_ = member_;
return object;
}
private:
int32_t member_;
};
class Derived : public Base {
public:
std::unique_ptr<Base> Clone() const override {
auto object = std::make_unique<Derived>();
object->member_ = member_;
object->another_member_ = another_member_;
return object;
}
private:
int32_t another_member_;
};
Notice that the Clone
function of derived class also returns
std::unique_ptr<Base>
, because C++ doesn't support return type covariance
with smart pointers.
Multiple Inheritance
Deep copying of classes with multiple inheritance is not supported!
Usually, multiple inheritance is used to satisfy some interface which doesn't
carry data. In such cases, you can ignore the multiple inheritance by
specifying :ignore-other-base-classes
option. For example:
(lcp:define-class derived (primary-base some-interface ...)
...
(:clone :ignore-other-base-classes t))
The above will produce deep copying code as if derived
is inheriting only
from primary-base
.
Templated Types
Deep copying of templated types is not supported!
Custom Clone Hooks
In cases when default deep copying code is not adequate, you may wish to
provide your own. LCP provides :clone
option that can be specified for each
member.
These hooks for custom copying expect a function with two arguments, source
and dest
, representing the member location in the cloned struct and member
location in the new struct. This allows to have a more generic function which
works with any member of some type. The return value of the function needs to
be C++ code.
It is also possible to specify that a member is cloned by copying by passing
:copy
instead of a function as an argument to :clone
.
(lcp:define-class my-class ()
((callback "std::function<void(int, int)>"
:clone :copy)
(widget "Widget"
:clone (lambda (source dest)
#>cpp
${dest} = WidgetFactory::Create(${source}.type());
cpp<#)))
(:clone))
Additional Arguments to Generated Clone Function
By default, Clone
function takes no argument. In some cases we would like to
accept additional arguments necessary to create a deep copy. Let's see how this
is done in LCP using the :args
option.
:args
expects a list of pairs. Each pair designates one argument. The first
element of pair is the argument name and the second is the C++ type of that
argument.
One case where we want to pass additional arguments to Clone
is when there is
another object that owns all objects being cloned. For example, AstStorage
owns all Memgraph AST nodes. For that reason, Clone
function of all AST node
types takes an AstStorage \*
argument. Here's a snippet from the actual AST
code:
(lcp:define-class tree ()
((uid :int32_t))
(:abstractp t)
...
(:clone :return-type (lambda (typename)
(format nil "~A*" typename))
:args '((storage "AstStorage *"))
:init-object (lambda (var typename)
(format nil "~A* ~A = storage->Create<~A>();"
typename var typename))))
(lcp:define-class expression (tree)
()
(:abstractp t)
...
(:clone))
(lcp:define-class where (tree)
((expression "Expression *" :initval "nullptr" :scope :public))
(:clone))
:args
option is only passed to the root class in inheritance hierarchy. By
default, the same extra arguments will be passed to all class members that are
cloned using Clone
metehod. The generated code is:
class Tree {
public:
virtual Tree *Clone(AstStorage *storage) const = 0;
private:
int32_t uid_;
};
class Expression : public Tree {
public:
Expression *Clone(AstStorage *storage) const override = 0;
};
class Where : public Tree {
public:
Expression *expression_{nullptr};
Where *Clone(AstStorage *storage) const override {
Where *object = storage->Create<Where>();
object->uid_ = uid_;
object->expression_ = expression_ ? expression_->Clone(storage) : nullptr;
return object;
}
};