2018-06-04 21:26:14 +08:00
|
|
|
# Kafka - openCypher clause
|
|
|
|
|
|
|
|
One must be able to specify the following when importing data from Kafka:
|
|
|
|
* Kafka URI
|
|
|
|
* Transform [script](transform.md) URI
|
|
|
|
|
|
|
|
|
|
|
|
Kafka endpoint is the URI of the leader broker and it is required for data
|
|
|
|
[extractor](extractor.md).
|
|
|
|
|
|
|
|
Minimum required syntax looks like:
|
|
|
|
```opencypher
|
|
|
|
CREATE STREAM kafka_stream AS LOAD DATA KAFKA '127.0.0.1/topic' WITH TRANSFORM
|
|
|
|
'127.0.0.1/transform.py';
|
|
|
|
```
|
|
|
|
|
|
|
|
The `CREATE STREAM` clause happens in a transaction.
|
|
|
|
|
|
|
|
The full openCypher clause for creating a stream is:
|
|
|
|
```opencypher
|
|
|
|
CREATE STREAM stream_name AS
|
|
|
|
LOAD DATA KAFKA 'URI'
|
2018-06-19 20:37:02 +08:00
|
|
|
WITH TOPIC 'topic'
|
2018-06-04 21:26:14 +08:00
|
|
|
WITH TRANSFORM 'URI'
|
2018-06-19 20:37:02 +08:00
|
|
|
[BATCH_INTERVAL milliseconds]
|
|
|
|
[BATCH_SIZE count]
|
2018-06-04 21:26:14 +08:00
|
|
|
```
|
|
|
|
|
2018-06-19 20:37:02 +08:00
|
|
|
The `WITH TOPIC` parameter specifies the kafka topic from which we'll stream
|
|
|
|
data.
|
|
|
|
|
2018-06-04 21:26:14 +08:00
|
|
|
The `WITH TRANSFORM` parameter should contain a URI of the transform script.
|
|
|
|
|
|
|
|
The `BATCH_INTERVAL` parameter defines the time interval in milliseconds
|
|
|
|
that defines the time between two successive stream importing operations.
|
|
|
|
|
2018-06-19 20:37:02 +08:00
|
|
|
The `BATCH_SIZE` parameter defines the count of kafka messages that will be
|
|
|
|
batched together before import.
|
|
|
|
|
|
|
|
If both `BATCH_INTERVAL` and `BATCH_SIZE` parameters are given, the condition
|
|
|
|
that is satisfied first will trigger the batched import.
|
|
|
|
|
|
|
|
Default values for `BATCH_INTERVAL` is 100 milliseconds, and the default value
|
|
|
|
for `BATCH_SIZE` is 10;
|
|
|
|
|
2018-06-04 21:26:14 +08:00
|
|
|
The `DROP` clause deletes a stream:
|
|
|
|
```opencypher
|
|
|
|
DROP STREAM stream_name;
|
|
|
|
```
|
|
|
|
|
|
|
|
The `SHOW` clause enables you to see all configured streams:
|
|
|
|
```opencypher
|
|
|
|
SHOW STREAMS;
|
|
|
|
```
|
|
|
|
|
|
|
|
You can also start/stop streams with the `START` and `STOP` clauses:
|
|
|
|
```opencypher
|
|
|
|
START STREAM stream_name [LIMIT count BATCHES];
|
|
|
|
STOP STREAM stream_name;
|
|
|
|
```
|
2018-07-03 19:57:53 +08:00
|
|
|
A stream needs to be stopped in order to start it and it needs to be started in
|
|
|
|
order to stop it. Starting a started or stopping a stopped stream will not
|
|
|
|
affect that stream.
|
2018-06-04 21:26:14 +08:00
|
|
|
|
|
|
|
There are also convenience clauses to start and stop all streams:
|
|
|
|
```opencypher
|
|
|
|
START ALL STREAMS;
|
|
|
|
STOP ALL STREAMS;
|
|
|
|
```
|
2018-07-03 19:57:53 +08:00
|
|
|
|
|
|
|
|
|
|
|
Before the actual import, you can also test the stream with the `TEST
|
|
|
|
STREAM` clause:
|
|
|
|
```opencypher
|
|
|
|
TEST STREAM stream_name [LIMIT count BATCHES];
|
|
|
|
```
|
|
|
|
When a stream is tested, data extraction and transformation occurs, but no
|
|
|
|
output is inserted in the graph.
|
|
|
|
|
|
|
|
A stream needs to be stopped in order to test it. When the batch limit is
|
|
|
|
omitted, `TEST STREAM` will run for only one batch by default.
|