Add documentation for mg_import_csv
Reviewers: mtomic, buda, florijan, mferencevic Reviewed By: florijan Subscribers: pullbot Differential Revision: https://phabricator.memgraph.io/D1073
This commit is contained in:
parent
86b6f32ce6
commit
6294bf19ec
@ -16,6 +16,7 @@ data structures, multi-version concurrency control and asynchronous IO.
|
||||
* [Drivers](drivers.md)
|
||||
* [Storable Data Types](data-types.md)
|
||||
* [openCypher Query Language](open-cypher.md)
|
||||
* [Import Tools](import-tools.md)
|
||||
* [Upcoming Features](upcoming-features.md)
|
||||
|
||||
[//]: # (Nothing should go below the contents section)
|
||||
|
@ -35,7 +35,7 @@ We have prepared a database snapshot for this example, so you can easily import
|
||||
when starting memgraph using `durability-directory` option:
|
||||
|
||||
```
|
||||
./memgraph --durability-directory /usr/share/memgraph/examples/TEDTalk --durability-enabled=false
|
||||
memgraph --durability-directory /usr/share/memgraph/examples/TEDTalk --durability-enabled=false
|
||||
```
|
||||
|
||||
NOTE: If you modify dataset these changes will stay
|
||||
|
118
docs/user_technical/import-tools.md
Normal file
118
docs/user_technical/import-tools.md
Normal file
@ -0,0 +1,118 @@
|
||||
## Import Tools
|
||||
|
||||
Memgraph comes with tools for importing data into the database. Currently,
|
||||
only import of CSV formatted is supported. We plan to support more formats in
|
||||
the future.
|
||||
|
||||
### CSV Import Tool
|
||||
|
||||
CSV data should be in Neo4j CSV compatible format. Detailed format
|
||||
specification can be found
|
||||
[here](https://neo4j.com/docs/operations-manual/current/tools/import/file-header-format/).
|
||||
|
||||
The import tool is run from the console, using the `mg_import_csv` command.
|
||||
|
||||
If you installed Memgraph using Docker, you will need to run the importer
|
||||
using the following command:
|
||||
|
||||
```
|
||||
docker run -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -v mg_import:/import-data \
|
||||
--entrypoint=mg_import_csv memgraph
|
||||
```
|
||||
|
||||
You can pass CSV files containing node data using the `--nodes` option.
|
||||
Multiple files can be specified by repeating the `--nodes` option. At least
|
||||
one node file should be specified. Similarly, graph edges (also known as
|
||||
relationships) are passed via the `--relationships` option. Multiple
|
||||
relationship files are imported by repeating the option. Unlike nodes,
|
||||
relationships are not required.
|
||||
|
||||
After reading the CSV files, the tool will by default search for the installed
|
||||
Memgraph configuration. If the configuration is found, the data will be
|
||||
written in the configured durability directory. If the configuration isn't
|
||||
found, you will need to use the `--out` option to specify the output file. You
|
||||
can use the same option to override the default behaviour.
|
||||
|
||||
Memgraph will recover the imported data on the next startup by looking in the
|
||||
durability directory.
|
||||
|
||||
For information on other options, run:
|
||||
|
||||
```
|
||||
mg_import_csv --help
|
||||
```
|
||||
|
||||
When using Docker, this translates to:
|
||||
|
||||
```
|
||||
docker run --entrypoint=mg_import_csv memgraph --help
|
||||
```
|
||||
|
||||
#### Example
|
||||
|
||||
Let's import a simple dataset.
|
||||
|
||||
Store the following in `comment_nodes.csv`.
|
||||
|
||||
```
|
||||
id:ID(COMMENT_ID),country:string,browser:string,content:string,:LABEL
|
||||
0,Croatia,Chrome,yes,Message;Comment
|
||||
1,United Kingdom,Chrome,thanks,Message;Comment
|
||||
2,Germany,,LOL,Message;Comment
|
||||
3,France,Firefox,I see,Message;Comment
|
||||
4,Italy,Internet Explorer,fine,Message;Comment
|
||||
```
|
||||
|
||||
Now, let's add `forum_nodes.csv`.
|
||||
|
||||
```
|
||||
id:ID(FORUM_ID),title:string,:LABEL
|
||||
0,General,Forum
|
||||
1,Support,Forum
|
||||
2,Music,Forum
|
||||
3,Film,Forum
|
||||
4,Programming,Forum
|
||||
```
|
||||
|
||||
And finally, set relationships between comments and forums in
|
||||
`relationships.csv`.
|
||||
|
||||
```
|
||||
:START_ID(COMMENT_ID),:END_ID(FORUM_ID),:TYPE
|
||||
0,0,POSTED_ON
|
||||
1,1,POSTED_ON
|
||||
2,2,POSTED_ON
|
||||
3,3,POSTED_ON
|
||||
4,4,POSTED_ON
|
||||
```
|
||||
|
||||
Now, you can import the dataset in Memgraph.
|
||||
|
||||
WARNING: Your existing recovery data will be considered obsolete, and Memgraph
|
||||
will load the new dataset.
|
||||
|
||||
Use the following command:
|
||||
|
||||
```
|
||||
mg_import_csv --nodes=comment_nodes.csv --nodes=forum_nodes.csv --relationships=relationships.csv
|
||||
```
|
||||
|
||||
If using Docker, things are a bit more complicated. First you need to move the
|
||||
CSV files where the Docker image can see them:
|
||||
|
||||
```
|
||||
mkdir -p /var/lib/docker/volumes/mg_import/_data
|
||||
cp comment_nodes.csv forum_nodes.csv relationships.csv /var/lib/docker/volumes/mg_import/_data
|
||||
```
|
||||
|
||||
Then, run the importer with the following:
|
||||
|
||||
```
|
||||
docker run -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -v mg_import:/import-data \
|
||||
--entrypoint=mg_import_csv memgraph \
|
||||
--nodes=/import-data/comment_nodes.csv --nodes=/import-data/forum_nodes.csv \
|
||||
--relationships=/import-data/relationships.csv
|
||||
```
|
||||
|
||||
Next time you run Memgraph, the dataset will be loaded.
|
||||
|
@ -46,8 +46,22 @@ auto ParseRepeatedFlag(const std::string &flagname, int argc, char *argv[]) {
|
||||
std::vector<std::string> values;
|
||||
for (int i = 1; i < argc; ++i) {
|
||||
std::string flag(argv[i]);
|
||||
if ((flag == "--" + flagname || flag == "-" + flagname) && i + 1 < argc)
|
||||
values.push_back(argv[++i]);
|
||||
int matched_flag_dashes = 0;
|
||||
if (utils::StartsWith(flag, "--" + flagname))
|
||||
matched_flag_dashes = 2;
|
||||
else if (utils::StartsWith(flag, "-" + flagname))
|
||||
matched_flag_dashes = 1;
|
||||
// Get the value if we matched the flag.
|
||||
if (matched_flag_dashes != 0) {
|
||||
std::string value;
|
||||
auto maybe_value = flag.substr(flagname.size() + matched_flag_dashes);
|
||||
if (maybe_value.empty() && i + 1 < argc)
|
||||
value = argv[++i];
|
||||
else if (!maybe_value.empty() && maybe_value.front() == '=')
|
||||
value = maybe_value.substr(1);
|
||||
CHECK(!value.empty()) << "The argument '" << flagname << "' is required";
|
||||
values.push_back(value);
|
||||
}
|
||||
}
|
||||
return values;
|
||||
}
|
||||
@ -385,11 +399,12 @@ std::string GetOutputPath() {
|
||||
// other flags which are defined in this file.
|
||||
LoadConfig();
|
||||
// Without durability_directory, we have to require 'out' flag.
|
||||
if (utils::Trim(FLAGS_durability_directory).empty())
|
||||
auto durability_dir = utils::Trim(FLAGS_durability_directory);
|
||||
if (durability_dir.empty())
|
||||
LOG(FATAL) << "Unable to determine snapshot output location. Please, "
|
||||
"provide the 'out' flag";
|
||||
std::string snapshot_dir = FLAGS_durability_directory + "/snapshots";
|
||||
try {
|
||||
auto snapshot_dir = durability_dir + "/snapshots";
|
||||
if (!std::experimental::filesystem::exists(snapshot_dir) &&
|
||||
!std::experimental::filesystem::create_directories(snapshot_dir)) {
|
||||
LOG(FATAL) << fmt::format("Cannot create snapshot directory '{}'",
|
||||
@ -398,7 +413,7 @@ std::string GetOutputPath() {
|
||||
} catch (const std::experimental::filesystem::filesystem_error &error) {
|
||||
LOG(FATAL) << error.what();
|
||||
}
|
||||
return std::string(durability::MakeSnapshotPath(snapshot_dir));
|
||||
return std::string(durability::MakeSnapshotPath(durability_dir));
|
||||
}
|
||||
|
||||
int main(int argc, char *argv[]) {
|
||||
|
@ -31,7 +31,8 @@ def main():
|
||||
os.makedirs(snapshot_dir, exist_ok=True)
|
||||
out_snapshot = os.path.join(snapshot_dir, 'snapshot')
|
||||
mg_import_csv = [args.mg_import_csv, '--nodes', comment_nodes,
|
||||
'--nodes', forum_nodes, '--relationships', relationships_0,
|
||||
'--nodes={}'.format(forum_nodes),
|
||||
'--relationships={}'.format(relationships_0),
|
||||
'--relationships', relationships_1,
|
||||
'--out', out_snapshot, '--csv-delimiter=|', '--array-delimiter=;']
|
||||
subprocess.check_call(mg_import_csv)
|
||||
|
Loading…
Reference in New Issue
Block a user