Add Tensorflow op feature spec
Reviewers: buda, teon.banek, mferencevic, msantl Reviewed By: buda, teon.banek Differential Revision: https://phabricator.memgraph.io/D1691
This commit is contained in:
parent
4ad9cfe1f4
commit
20dcb05c50
61
docs/feature_spec/tensorflow_op/technicalities.md
Normal file
61
docs/feature_spec/tensorflow_op/technicalities.md
Normal file
@ -0,0 +1,61 @@
|
||||
# Tensorflow Op - Technicalities
|
||||
|
||||
The final result should be a shared object (".so") file that can be
|
||||
dynamically loaded by the Tensorflow runtime in order to directly
|
||||
access the bolt client.
|
||||
|
||||
## About Tensorflow
|
||||
|
||||
Tensorflow is usually used with Python such that the Python code is used
|
||||
to define a directed acyclic computation graph. Basically no computation
|
||||
is done in Python. Instead, values from Python are copied into the graph
|
||||
structure as constants to be used by other Ops. The directed acyclic graph
|
||||
naturally ends up with two sets of border nodes, one for inputs, one for
|
||||
outputs. These are sometimes called "feeds".
|
||||
|
||||
Following the Python definition of the graph, during training, the entire
|
||||
data processing graph/pipeline is called from Python as a single expression.
|
||||
This leads to lazy evaluation since the called result has already been
|
||||
defined for a while.
|
||||
|
||||
Tensorflow internally works with tensors, i.e. n-dimensional arrays. That
|
||||
means all of its inputs need to be matrices as well as its outputs. While
|
||||
it is possible to feed data directly from Python's numpy matrices straight
|
||||
into Tensorflow, this is less desirable than using the Tensorflow data API
|
||||
(which defines data input and processing as a Tensorflow graph) because:
|
||||
|
||||
1. The data API is written in C++ and entirely avoids Python and as such
|
||||
is faster
|
||||
2. The data API, unlike Python is available in "Tensorflow serving". The
|
||||
default way to serve Tensorflow models in production.
|
||||
|
||||
Once the entire input pipeline is defined via the tf.data API, its input
|
||||
is basically a list of node IDs the model is supposed to work with. The
|
||||
model, through the data API knows how to connect to Memgraph and execute
|
||||
openCypher queries in order to get the remaining data it needs.
|
||||
(For example features of neighbouring nodes.)
|
||||
|
||||
## The Interface
|
||||
|
||||
I think it's best you read the official guide...
|
||||
<https://www.tensorflow.org/extend/adding_an_op>
|
||||
And especially the addition that specifies how data ops are special
|
||||
<https://www.tensorflow.org/extend/new_data_formats>
|
||||
|
||||
## Compiling the TF Op
|
||||
|
||||
There are two options for compiling a custom op.
|
||||
One of them involves pulling the TF source, adding your code to it and
|
||||
compiling via bazel.
|
||||
This is probably awkward to do for us and would
|
||||
significantly slow down compilation.
|
||||
|
||||
The other method involves installing Tensorflow as a Python package and
|
||||
pulling the required headers from for example:
|
||||
`/usr/local/lib/python3.6/site-packages/tensorflow/include`
|
||||
We can then compile our Op with our regular build system.
|
||||
|
||||
This is practical since we can copy the required headers to our repo.
|
||||
If necessary, we can have several versions of the headers to build several
|
||||
versions of our Op for every TF version which we want to support.
|
||||
(But this is unlikely to be required as the API should be stable).
|
142
docs/feature_spec/tensorflow_op/usage_example.md
Normal file
142
docs/feature_spec/tensorflow_op/usage_example.md
Normal file
@ -0,0 +1,142 @@
|
||||
# Example for Using the Bolt Client Tensorflow Op
|
||||
|
||||
## Dynamic Loading
|
||||
|
||||
``` python3
|
||||
import tensorflow as tf
|
||||
|
||||
mg_ops = tf.load_op_library('/usr/bin/memgraph/tensorflow_ops.so')
|
||||
```
|
||||
|
||||
## Basic Usage
|
||||
|
||||
``` python3
|
||||
dataset = mg_ops.OpenCypherDataset(
|
||||
# This is probably unfortunate as the username and password
|
||||
# get hardcoded into the graph, but for the simple case it's fine
|
||||
"hostname:7687", auth=("user", "pass"),
|
||||
|
||||
# Our query
|
||||
'''
|
||||
MATCH (n:Train) RETURN n.id, n.features
|
||||
''',
|
||||
|
||||
# Cast return values to these types
|
||||
(tf.string, tf.float32))
|
||||
|
||||
# Some Tensorflow data api boilerplate
|
||||
iterator = dataset.make_one_shot_iterator()
|
||||
next_element = iterator.get_next()
|
||||
|
||||
# Up to now we have only defined our computation graph which basically
|
||||
# just connects to Memgraph
|
||||
# `next_element` is not really data but a handle to a node in the Tensorflow
|
||||
# graph, which we can and do evaluate
|
||||
# It is a Tensorflow tensor with shape=(None, 2)
|
||||
# and dtype=(tf.string, tf.float)
|
||||
# shape `None` means the shape of the tensor is unknown at definition time
|
||||
# and is dynamic and will only be known once the tensor has been evaluated
|
||||
|
||||
with tf.Session() as sess:
|
||||
node_ids = sess.run(next_element)
|
||||
# `node_ids` contains IDs and features of all the nodes
|
||||
# in the graph with the label "Train"
|
||||
# It is a numpy.ndarray with a shape ($n_matching_nodes, 2)
|
||||
```
|
||||
|
||||
## Memgraph Client as a Generic Tensorflow Op
|
||||
|
||||
Other than the Tensorflow Data Op, we'll want to support a generic Tensorflow
|
||||
Op which can be put anywhere in the Tensorflow computation Graph. It takes in
|
||||
an arbitrary tensor and produces a tensor. This would be used in the GraphSage
|
||||
algorithm to fetch the lowest level features into Tensorflow
|
||||
|
||||
```python3
|
||||
requested_ids = np.array([1, 2, 3])
|
||||
ids_placeholder = tf.placeholder(tf.int32)
|
||||
|
||||
model = mg_ops.OpenCypher()
|
||||
"hostname:7687", auth=("user", "pass"),
|
||||
"""
|
||||
UNWIND $node_ids as nid
|
||||
MATCH (n:Train {id: nid})
|
||||
RETURN n.features
|
||||
""",
|
||||
|
||||
# What to call the input tensor as an openCypher parameter
|
||||
parameter_name="node_ids",
|
||||
|
||||
# Type of our resulting tensor
|
||||
dtype=(tf.float32)
|
||||
)
|
||||
|
||||
features = model(ids_placeholder)
|
||||
|
||||
with tf.Session() as sess:
|
||||
result = sess.run(features,
|
||||
feed_dict={ids_placeholder: requested_ids})
|
||||
```
|
||||
|
||||
This is probably easier to implement than the Data Op, so it might be a good
|
||||
idea to start with.
|
||||
|
||||
## Production Usage
|
||||
|
||||
During training, in the GraphSage algorithm at least, Memgraph is at the
|
||||
beginning and at the end of the Tensorflow computation graph.
|
||||
At the beginning, the Data Op provides the node IDs which are fed into the
|
||||
generic Tensorflow Op to find their neighbours and their neighbours and
|
||||
their features.
|
||||
|
||||
Production usage differs in that we don't use the Data Op. The Data Op is
|
||||
effectively cut off and the initial input is fed by Tensorflow serving,
|
||||
with the data found in the request.
|
||||
|
||||
For example a JSON request to classify a node might look like:
|
||||
|
||||
`POST http://host:port/v1/models/GraphSage/versions/v1:classify`
|
||||
|
||||
With the contents:
|
||||
|
||||
```json
|
||||
{
|
||||
"examples": [
|
||||
{"node_id": 1},
|
||||
{"node_id": 2}
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
Every element of the "examples" list is an example to be computed. Each is
|
||||
represented by a dict with keys matching names of feeds in the Tensorflow
|
||||
graph and values being the values we want fed in for each example
|
||||
|
||||
The REST API then replies in kind with the classification result in JSON
|
||||
|
||||
Note about adding our custom Op to Tensorflow serving.
|
||||
Our Ops .so can be added into the Bazel build to link with Tensorflow serving
|
||||
or it can be dynamically loaded by starting Tensorflow serving with a flag
|
||||
`--custom_op_paths`
|
||||
|
||||
## Considerations
|
||||
|
||||
There might be issues here that the url to connect to Memgraph is
|
||||
hardcoded into the op and would thus be wrong when moved to production,
|
||||
requiring some type of a hack to make work. We probably want to solve
|
||||
this by having the client op take in another tf.Variable as an input
|
||||
which would contain a connection url and username/password.
|
||||
We have to research whether this makes it easy enough to move to
|
||||
production, as the connection string variable is still a part of the
|
||||
graph, but maybe easier to replace.
|
||||
|
||||
It is probably the best idea to utilize openCypher parameters to make
|
||||
our queries flexible. The exact API as to how to declare the parameters
|
||||
in Python is open to discussion.
|
||||
|
||||
The Data Op might not even be necessary to implement as it is not
|
||||
key for production use. It can be replaced in training mode with
|
||||
feed dicts and either
|
||||
|
||||
1. Getting the initial list of nodes via a Python Bolt client
|
||||
2. Creating a separate Tensorflow computation graph that gets all the
|
||||
relevant node IDs into Python
|
Loading…
Reference in New Issue
Block a user