memgraph/docs/feature_spec/kafka/opencypher.md
Marko Budiselic e80c49f856 Add dgp related docs
Summary:
The following documents related to dynamic graph partitioning
are added:
  * Dependency Diagram
  * Feature Specification
  * Feature Reference

Reviewers: teon.banek

Reviewed By: teon.banek

Subscribers: pullbot

Differential Revision: https://phabricator.memgraph.io/D1584
2018-09-03 12:06:44 +02:00

2.2 KiB

Kafka - openCypher clause

One must be able to specify the following when importing data from Kafka:

  • Kafka URI
  • Kafka topic
  • Transform script URI

Minimum required syntax looks like:

CREATE STREAM stream_name AS LOAD DATA KAFKA 'URI'
  WITH TOPIC 'topic'
  WITH TRANSFORM 'URI';

The full openCypher clause for creating a stream is:

CREATE STREAM stream_name AS
  LOAD DATA KAFKA 'URI'
  WITH TOPIC 'topic'
  WITH TRANSFORM 'URI'
  [BATCH_INTERVAL milliseconds]
  [BATCH_SIZE count]

The CREATE STREAM clause happens in a transaction.

WITH TOPIC parameter specifies the Kafka topic from which we'll stream data.

WITH TRANSFORM parameter should contain a URI of the transform script.

BATCH_INTERVAL parameter defines the time interval in milliseconds which is the time between two successive stream importing operations.

BATCH_SIZE parameter defines the count of Kafka messages that will be batched together before import.

If both BATCH_INTERVAL and BATCH_SIZE parameters are given, the condition that is satisfied first will trigger the batched import.

Default value for BATCH_INTERVAL is 100 milliseconds, and the default value for BATCH_SIZE is 10;

The DROP clause deletes a stream:

DROP STREAM stream_name;

The SHOW clause enables you to see all configured streams:

SHOW STREAMS;

You can also start/stop streams with the START and STOP clauses:

START STREAM stream_name [LIMIT count BATCHES];
STOP STREAM stream_name;

A stream needs to be stopped in order to start it and it needs to be started in order to stop it. Starting a started or stopping a stopped stream will not affect that stream.

There are also convenience clauses to start and stop all streams:

START ALL STREAMS;
STOP ALL STREAMS;

Before the actual import, you can also test the stream with the TEST STREAM clause:

TEST STREAM stream_name [LIMIT count BATCHES];

When a stream is tested, data extraction and transformation occurs, but no output is inserted in the graph.

A stream needs to be stopped in order to test it. When the batch limit is omitted, TEST STREAM will run for only one batch by default.