64f189cc8a
Summary: First version of the feature spec for importing streams of data using kafka in memgraph. Reviewers: buda, teon.banek, dgleich, ipaljak Reviewed By: buda Subscribers: lion, mculinovic Differential Revision: https://phabricator.memgraph.io/D1415
1.5 KiB
1.5 KiB
Kafka - data transform
The transform script is a user defined script written in Python. The script should be aware of the data format in the Kafka message.
Each Kafka message is byte length encoded, which means that the first eight bytes of each message contain the length of the message.
More on the message format can be seen here.
The script will be embedded in our C++ codebase using pythons embedding feature.
A sample code for a streaming transform script could look like this:
import struct
import sys
def get_records():
while True:
message_len = sys.stdin.read(8)
if len(message_len) == 8:
message_len = struct.unpack("L", message_len)[0]
record = sys.stdin.read(message_len)
yield record
else:
assert len(message_len) == 0, message_len
return
def create_vertex(fields):
return "CREATE (n:Node {{id: {}}})".format(fields[1])
def create_edge(fields):
return "MATCH (n:Node {{id: {}}}) "\
"MATCH ((m:Node {{id : {}}})) "\
"CREATE (n)-[e:Edge{{value: {}}}]->(m) "\
.format(fields[1], fields[2], fields[3])
for record in get_records():
fields = record.split("\t")
if fields[0] == "v":
return create_vertex(fields):
else:
return create_edge(fields)
The script should output openCypher query strings based on the type of the records.