memgraph/docs/dev/lcp.md
Teon Banek a5926b4e0f Generate Save functions from LCP as top level
Summary:
This should allow us to more easily decouple the code which should be
open sourced. Unfortunately, the downside of this approach is that we
cannot rely on virtual calls to dispatch the serialization to correct
type. Another downside is that members need to be publicly accessible
for serialization.

Reviewers: mtomic, msantl

Reviewed By: mtomic

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1596
2018-09-28 10:26:56 +02:00

766 lines
23 KiB
Markdown

# Lisp C++ Preprocessor (LCP)
In our development process we are using Common Lisp to generate some parts of
the C++ codebase. The idea behind this is supplementing C++ with better
meta-programming capabilities to automate tasks and prevent bugs due to code
duplication. Primary candidate for using more powerful meta-programming is
generating serialization code. Such code is almost always the same: go through
all `struct` or `class` members and invoke the serialization function on them.
Writing such code manually is error prone when adding members, because you may
easily forget to correctly update the serialization code. Thus, the Lisp C++
Preprocessor was born. It is hooked in our build process as a step before
compilation. The remainder of the document describes how to use LCP and its
features.
Contents
* [Running LCP](#running-lcp)
* [Writing LCP](#writing-lcp)
- [Inlining C++ in Common Lisp](#inlining-cpp)
- [C++ Namespaces](#cpp-namespaces)
- [C++ Enumerations](#cpp-enums)
- [C++ Classes & Structs](#cpp-classes)
- [Defining an RPC](#defining-an-rpc)
- [Cap'n Proto Serialization](#capnp-serial)
## Running LCP
You can generate C++ from an LCP file by running the following command.
`./tools/lcp <path-to-file.lcp>`
The LCP will produce a `path-to-file.hpp` file and potentially a
`path-to-file.lcp.cpp` file. The `.cpp` file is generated if some parts of the
code need to be in the implementation file. This is usually the case when
generating serialization code. Note that the `.cpp` file has the extension
appended to `.lcp`, so that you are free to define your own `path-to-file.cpp`
which includes the generated `path-to-file.hpp`.
One serialization format uses Cap'n Proto library, but to use it, you need to
provide an ID. The ID is generated by invoking `capnp id`. When you want to
generate Cap'n Proto serialization, you need to pass the generated ID to LCP.
`./tools/lcp <path-to-file.lcp> $(capnp id)`
Generating Cap'n Proto serialization will produce an additional file,
`path-to-file.capnp`, which contains the serialization schema.
You may wonder why the LCP doesn't invoke `capnp id` itself. Unfortunately,
such behaviour would be wrong when running LCP on the same file multiple
times. Each run would produce a different ID and the serialization code would
be incompatible between versions.
### CMake
The LCP is run in CMake using the `add_lcp` function defined in
`CMakeLists.txt`. You can take a look at the function documentation there for
information on how to add your new LCP files to the build system.
## Writing LCP
A LCP file should have the `.lcp` extension, but the code written is
completely valid Common Lisp code. This means that you have a complete
language at your disposal before even the C++ is compiled. You can view this
as similar to the C++ templates and macros, but they do not have access to
a complete language.
Besides Common Lisp, you are allowed to write C++ code verbatim. This means
that C++ and Lisp code coexist in the file. How to do that, as well as other
features are described below.
### Inlining C++ in Common Lisp {#inlining-cpp}
To insert C++ code, you need to use a `#>cpp ... cpp<#` block. This is most
often used at the top of the file to write Doxygen documentation and put some
includes. For example:
```cpp
#>cpp
/// @file My Doxygen style documentation about this file
#pragma once
#include <vector>
cpp<#
```
The above code will be pasted as is into the generated header file. If you
wish to have a C++ block in the `.cpp` implementation file instead, you should
use `lcp:in-impl` function. For example:
```cpp
(lcp:in-impl
#>cpp
void MyClass::Method(int awesome_number) {
// Do something with awesome_number
}
cpp<#)
```
The C++ block also supports string interpolation with a syntax akin to shell
variable access, `${lisp-variable}`. At the moment, only variables are
supported and they have to be pretty printable in Common Lisp (i.e. support
the `~A` format directive). For example, we can make a precomputed sinus
function for integers from 0 to 5:
```lisp
(let ((sin-from-0-to-5
(format nil "~{~A~^, ~}" (loop for i from 0 below 5 collect (sin i)))))
#>cpp
static const double kSinFrom0To5[] = {${sin-from-0-to-5}};
cpp<#)
```
The following will be generated.
```cpp
static const double kSinFrom0To5[] = {0.0, 0.84147096, 0.9092974, 0.14112, -0.7568025};
```
Since you have a complete language at your disposal, this is a powerful tool
to generate tables for computations which would take a very long time during
the execution of the C++ program.
### C++ Namespaces {#cpp-namespaces}
Although you can use inline C++ to open and close namespaces, it is
recommended to use `lcp:namespace` and `lcp:pop-namespace` functions. LCP will
report an error if you have an unclosed namespace, unlike Clang and GCC which
most of the times give strange errors due to C++ grammar ambiguity. Additional
benefit is that LCP will track the namespace stack and correctly wrap any C++
code which should be put in the `.cpp` file.
For example:
```lisp
;; example.lcp
(lcp:namespace utils)
;; Function declaration in header
#>cpp
bool StartsWith(const std::string &string, const std::string &prefix);
cpp<#
;; Function implementation in implementation file
(lcp:in-impl
#>cpp
bool StartsWith(const std::string &string, const std::string &prefix) {
// Implementation code
return false;
}
cpp<#)
(lcp:pop-namespace) ;; utils
```
The above will produce 2 files, header and implementation:
```cpp
// example.hpp
namespace utils {
bool StartsWith(const std::string &string, const std::string &prefix);
}
```
```cpp
// example.lcp.cpp
namespace utils {
bool StartsWith(const std::string &string, const std::string &prefix) {
// Implementation code
return false;
}
}
```
### C++ Enumerations {#cpp-enums}
LCP provides a `lcp:define-enum` macro to define a C++ `enum class` type. This
will make LCP aware of the type and all its possible values. This makes it
possible to generate the serialization code. In the future, LCP may generate
"string to enum" and "enum to string" functions.
Example:
```lisp
(lcp:define-enum days-in-week
(monday tuesday wednesday thursday friday saturday sunday)
;; Optional documentation
(:documentation "Enumerates days of the week")
;; Optional directive to generate serialization code
(:serialize))
```
Produces:
```cpp
/// Enumerates days of the week
enum class DaysInWeek {
MONDAY,
TUESDAY,
WEDNESDAY,
THURSDAY,
FRIDAY,
SATURDAY,
SUNDAY
};
// serialization code ...
```
### C++ Classes & Structs {#cpp-classes}
For defining C++ classes, there is a `lcp:define-class` macro. Its counterpart
for structures is `lcp:define-struct`. They are exactly the same, but
`lcp:define-struct` will put members in public scope by default. Just like in
C++.
Defining classes is a bit more involved, because they have many customization
options. They syntax follows the syntax of class definition in Common Lisp
(see `defclass`).
Basic example:
```lisp
(lcp:define-class my-class ()
((primitive-value :int64_t)
(stl-vector "std::vector<int>"))
;; Optional documentation
(:documentation "My class documentation")
;; Define explicitly public, protected or private code. All are optional.
(:public #>cpp // some public code, e.g. methods cpp<#)
(:protected #>cpp // protected cpp<#)
(:private #>cpp //private cpp<#))
```
The above will generate:
```cpp
/// My class documentation
class MyClass {
public:
// some public code, e.g. methods
protected:
// protected
private:
// private
int64_t primitive_value_;
std::vector<int> stl_vector_;
};
```
As you can see, members in LCP are followed by a type. For primitive types, a
Lisp keyword is used. E.g. `:int64_t`, `:bool`, etc. Other types, like STL
containers use a valid C++ string to specify type.
C++ supports nesting types inside a class. You can do the same in LCP inside
any of the scoped additions.
For example:
```lisp
(lcp:define-class my-class ()
((member "NestedType")
(value "NestedEnum"))
(:private
(lcp:define-enum nested-enum (first-value second-value))
(lcp:define-class nested-type ()
((member :int64_t)))
#>cpp
// Some other C++ code
cpp<#))
```
The above should produce expected results.
You can add a base classes after the class name. The name should be a Lisp
symbol for bases classes defined through `lcp:define-class`, so that LCP
tracks the inheritance. Otherwise, it should be a string.
For example:
```lisp
(lcp:define-class derived (my-class "UnknownInterface")
())
```
Will generate:
```cpp
class Derived : public MyClass, public UnknownInterface {
};
```
Similarly, you can specify template parameters. Instead of giving just a name
to `define-class`, you give a list where the first element is the name of the
class, while others name the template parameters.
```lisp
(lcp:define-class (my-map t-key t-value) ()
((underlying-map "std::unordered_map<TKey, TValue>")))
```
The above will generate:
```cpp
template <class TKey, class TValue>
class MyMap {
private:
std::unordered_map<TKey, TValue> underlying_map_;
};
```
Other than tweaking the class definition, you can also do additional
configuration of members. The following options are supported.
* `:initval` -- sets the initial value of a member
* `:reader` -- generates a public getter
* `:scope` -- set the scope of a member, one of `:public`, `:private` or
`:protected`
* `:documentation` -- Doxygen documentation of a member
* various serialization options which are explained later
For example:
```lisp
(lcp:define-class my-class ()
((member "std::vector<int>" :scope :protected :initval "1, 2, 3" :reader t
:documentation "Member documentation")))
```
Will generate:
```cpp
class MyClass {
public:
const auto &member() { return member_; }
protected:
/// Member documentation
std::vector<int> member_{1, 2, 3};
};
```
### Defining an RPC
In our codebase, we have implemented remote procedure calls. These are used
for communication between Memgraph instances in a distributed system. Each RPC
is registered by its type and requires serializable data structures. Writing
RPC compliant structure requires a lot of boilerplate. To ease the pain of
defining a new RPC we have a macro, `lcp:define-rpc`.
Definition consists of 2 parts: request and response. You can specify members
of each part. Member definition is the same as in `lcp:define-class`.
For example:
```lisp
(lcp:define-rpc query-result
(:request
((tx-id "tx::TransactionId")
(query-id :int64_t)))
(:response
((values "std::vector<int>"))))
```
The above will generate relatively large amount of C++ code, which is omitted
here as the details aren't important for understanding the use. Examining the
generated code is left as an exercise for the reader.
The important detail is that in C++ you will have a `QueryResultRpc`
structure, which is used to register the behaviour of an RPC server. You need
to perform the registration manually. For example:
```cpp
// somewhere in code you have a server instance
rpc_server.Register<QueryResultRpc>(
[](const auto &req_reader, auto *res_builder) {
QueryResultReq request;
request.Load(req_reader);
// process the request and send the response
QueryResultRes response(values_for_response);
Save(response, res_builder);
});
// somewhere else you have a client which sends the RPC
tx::TransactionId tx_id = ...
int64_t query_id = ...
auto response = rpc_client.template Call<QueryResultRpc>(tx_id, query_id);
if (response) {
const auto &values = response->getValues();
// do something with values
}
```
RPC structures use Cap'n Proto for serialization. The above variables
`req_reader` and `res_builder` are used to access Cap'n Proto structures.
Obviously, the LCP will generate the Cap'n Proto schema alongside the C++
code for serialization.
### Cap'n Proto Serialization {#capnp-serial}
Primary purpose of LCP was to make serialization of types easier. Our
serialization library of choice for C++ is Cap'n Proto. LCP provides
generation and tuning of its serialization code. Previously, LCP supported
boost serialization, but it was removed.
To specify a class or structure for serialization, you may pass a
`:serialize :capnp` option when defining such type. (Note that
`lcp:define-enum` takes `:serialize` without any arguments).
For example:
```lisp
(lcp:define-class my-class ()
((member :int64_t))
(:serialize :capnp))
```
`:serialize` option will generate a Cap'n Proto schema of the class and store
it in the `.capnp` file. C++ code will be generated for saving and loading
members:
```cpp
// Top level function
void Save(const MyClass &instance, capnp::MyClass::Builder *builder);
// Member function
void MyClass::Load(const capnp::MyClass::Reader &reader);
```
Since we use top level functions, the class needs to have some sort of public
access to its members.
The schema file will be namespaced in `capnp`. To change add a prefix
namespace use `lcp:capnp-namespace` function. For example, if we use
`(lcp:capnp-namespace "my_namespace")` then the reader and builder would be in
`my_namespace::capnp`.
Serializing a class hierarchy is also supported. The most basic case with
single inheritance works out of the box. Handling other cases is explained in
later sections.
For example:
```lisp
(lcp:define-class base ()
((base-member "std::vector<int64_t>"))
(:serialize :capnp))
(lcp:define-class derived (base)
((derived-member :bool))
(:serialize :capnp))
```
Note that all classes need to have the `:serialize` option set. Signatures of
`Save` and `Load` functions are changed to accept reader and builder to the
base class. And a `Construct` function is added which will instantiate a
concrete type from a base reader.
```cpp
void Save(const Derived &derived, capnp::Base *builder);
class Derived {
...
static std::unique_ptr<Base> Construct(const capnp::Base &reader);
virtual void Load(const capnp::Base &reader);
```
With polymorphic types, you need to call `Base::Construct` followed by `Load`.
#### Multiple Inheritance
Cap'n Proto does not support any form of inheritance, instead we are
handling it manually. Single inheritance was relatively easy to add to Cap'n
Proto, we simply enumerate all derived types inside the union of a base type.
Multiple inheritance is a different beast and as such is not directly
supported.
Most form of inheritance should actually be a simple composition, and we can
treat parent classes as being composed inside our derived type.
For example:
```lisp
(lcp:define-class derived (first-base second-base)
...
(:serialize :capnp :inherit-compose '(second-base)))
```
With `:inherit-compose` you can pass a list of parent classes which should be
encoded as composition inside the Cap'n Proto schema. LCP will complain if
there is multiple inheritance but you didn't specify `:inherit-compose`.
The downside of this approach is that `Save` and `Load` will work only on
`FirstBase`. Serializing a pointer to `SecondBase` would be incorrect.
#### Inheriting C++ Class Outside of LCP
Classes defined outside of `lcp:define-class` are not visible to LCP and LCP
will not be able to generate correct serialization code.
The cases so far have been only with classes that are pure interface and need
no serialization code. This is signaled to LCP by passing the option `:base t`
to `:serialie :capnp`. LCP will treat such classes as actually being the base
class of a hierarchy.
For example:
```lisp
(lcp:define-class my-class ("utils::TotalOrdering")
(...)
(:serialize :capnp :base t))
(lcp:define-class derived (my-class)
(...)
(:serialize :capnp))
```
Only the base class for serialization has the `:base t` option set. Derived
classes are defined as usual. This relies on the fact that we do not expect
anyone to have a pointer to `utils::TotalOrdering` and use it for
serialization and deserialization.
#### Template Classes
Currently, LCP supports the most primitive form of serializing templated
classes. The template arguments must be provided to specify an explicit
instantiation. Cap'n Proto does support generics, so we may want to upgrade
LCP to use them in the future.
To specify template arguments, pass a `:type-args` option. For example:
```lisp
(lcp:define-class (my-container t-value) ()
(...)
(:serialize :capnp :type-args '(my-class)))
```
The above will support serialization of `MyContainer<MyClass>` type.
The syntax will work even if our templated class inherits from non-templated
classes. All other cases of inheritance with templates are forbidden in LCP
serialization.
#### Cap'n Proto Schemas and Type Conversions
You can import other serialization schemas by using `lcp:capnp-import`
function. It expects a name for the import and the path to the schema file.
For example, to import everything from `utils/serialization.capnp` under the
name `Utils`, you can do the following:
```lisp
(lcp:capnp-import 'utils "/utils/serialization.capnp")
```
To use those types, you need to register a conversion from C++ type to schema
type. There are two options, registering a whole file conversion with
`lcp:capnp-type-conversion` or converting a specific class member.
For example, you have a class with member of type `Bound` and there is a
schema for it also named `Bound` inside the imported schema.
You can use `lcp:capnp-type-conversion` like so:
```lisp
(lcp:capnp-type-conversion "Bound" "Utils.Bound")
(lcp:define-class my-class ()
((my-bound "Bound")))
```
Specifying only a member conversion can be done with `:capnp-type` member
option:
```lisp
(lcp:define-class my-class ()
((my-bound "Bound" :capnp-type "Utils.Bound")))
```
#### Custom Save and Load Hooks
Sometimes the default serialization is not adequate and you may wish to
provide your own serialization code. For those reasons, LCP provides
`:capnp-save`, `:capnp-load` and `:capnp-init` options on each class member.
The simplest is `:capnp-init` which when set to `nil` will not generate a
`initMember` call on a builder. Cap'n Proto requires that compound types are
initialized before beginning to serialize its members. `:capnp-init` allows
you to delay the initialization to your custom save code. You rarely want to
set `:capnp-init nil`.
Custom save code is added as a value of `:capnp-save`. It should be a function
which takes 3 arguments.
1. Name of builder variable.
2. Name of the class (or struct) member.
3. Name of the member in Cap'n Proto schema.
The result of the function needs to be a C++ code block.
You will rarely need to use the 3rd argument, so it should be ignored in most
cases. It is usually needed when you set `:capnp-init nil`, so that you can
correctly initialize the builder.
Similarly, `:capnp-load` expects a function taking a reader and a member, then
returns a C++ block.
Example:
```lisp
(lcp:define-class my-class ()
((my-member "ComplexType"
:capnp-init nil
:capnp-save (lambda (builder member capnp-name)
#>cpp
auto data = ${member}.GetSaveData();
auto my_builder = ${builder}.init${capnp-name}();
my_builder.setData(data);
cpp<#)
:capnp-load (lambda (reader member)
#>cpp
auto data = ${reader}.getData();
${member}.LoadFromData(data);
cpp<#)))
(:serialize :capnp))
```
With custom serialization code, you may want to get additional details through
extra arguments to `Save` and `Load` functions. This is described in the next
section.
There are also cases where you always need a custom serialization code. LCP
provides helper functions for abstracting some common details. These functions
are listed further down in this document.
#### Arguments for Save and Load
Default arguments for `Save` and `Load` function are Cap'n Proto builder and
reader, respectively. In some cases you may wish to send additional arguments.
This is most commonly needed when tracking `shared_ptr` serialization, to
avoid serializing the same pointer multiple times.
Additional arguments are specified by passing `:save-args` and `:load-args`.
You can specify either of them, but in most cases you want both.
For example:
```lisp
;; Class for tracking details during save
(lcp:define-class save-helper ()
(...))
;; Class for tracking details during load
(lcp:define-class load-helper ()
(...))
(lcp:define-class my-class ()
((member "std::shared_ptr<int>"
:capnp-save ;; custom save
:capnp-load ;; custom load
))
(:serialize :capnp
:save-args '((save-helper "SaveHelper *"))
:load-args '((load-helper "LoadHelper *"))))
```
The custom serialization code will now have access to `save_helper` and
`load_helper` variables in C++. You can add more arguments by expanding the
list of pairs, e.g.
```lisp
:save-args '((first-helper "SomeType *") (second-helper "OtherType *") ...)
```
#### Custom Serialization Helper Functions
##### Helper for `std::optional`
When using `std::optional` with primitive C++ types or custom types known to
LCP, you do not need to use any helper. In the example below, things should be
serialized as expected:
```lisp
(lcp:define-class my-class-with-primitive-optional ()
((primitive-optional "std::experimental::optional<int64_t>"))
(:serialize :capnp))
(lcp:define-class my-class-with-known-type-optional ()
((known-type-optional "std::experimental::optional<MyClassWithPrimitiveOptional>"))
(:serialize :capnp))
```
In cases when the value contained in `std::optional` needs custom
serialization code you may use `lcp:capnp-save-optional` and
`lcp:capnp-load-optional`.
Both functions expect 3 arguments.
1. Cap'n Proto type in C++.
2. C++ type of the value inside `std::optional`.
3. Optional C++ lambda code.
The lambda code is optional, because LCP will generate the default
serialization code which invokes `Save` and `Load` function on the value
stored inside the optional. Since most of the serialized classes follow the
convention, you will rarely need to provide this 3rd argument.
For example:
```lisp
(lcp:define-class my-class ()
((member "std::experimental::optional<SomeType>"
:capnp-save (lcp:capnp-save-optional
"capnp::SomeType" "SomeType"
"[](auto *builder, const auto &val) { ... }")
:capnp-load (lcp:capnp-load-optional
"capnp:::SomeType" "SomeType"
"[](const auto &reader) { ... return loaded_val; }"))))
```
##### Helper for `std::vector`
For custom serialization of vector elements, you may use
`lcp:capnp-save-vector` and `lcp:capnp-load-vector`. They function exactly the
same as helpers for `std::optional`.
##### Helper for enumerations
If the enumeration is defined via `lcp:define-enum`, the default LCP
serialization should generate the correct code.
However, if LCP cannot infer the serialization code, you can use helper
functions `lcp:capnp-save-enum` and `lcp:capnp-load-enum`. Both functions
require 3 arguments.
1. C++ type of equivalent Cap'n Proto enum.
2. Original C++ enum type.
3. List of enumeration values.
Example:
```lisp
(lcp:define-class my-class ()
((enum-value "SomeEnum"
:capnp-init nil ;; must be set to nil
:capnp-save (lcp:capnp-save-enum
"capnp::SomeEnum" "SomeEnum"
'(first-value second-value))
:capnp-load (lcp:capnp-load-enum
"capnp::SomeEnum" "SomeEnum"
'(first-value second-value)))))
```