各种 NoSQL 的比较 TODO: 中英文之间需要半角空格
================

即使关系型数据库依然是非常有用的工具，它们持续几十年的垄断地位就要走到头了。现在已经存在无数能撼动关系型数据库地位的 NoSQL，当然，这些 NoSQL 还无法完全取代它们。（也就是说，关系型数据库还是处理关系型事务的最佳方式。）

NoSQL 与 NoSQL 之间的区别，要远大于 SQL 与 SQL 之间的区别。所以软件架构师必须要在项目一开始就选好一款合适的 NoSQL。

考虑到这种情况，本文为大家介绍以下几种 NoSQL 之间的区别：[Cassandra][], [Mongodb][], [CouchDB][], [Redis][], [Riak][], [Couchbase (ex-Membase)][], [Hypertable][], [ElasticSearch][], [Accumulo][], [VoltDB][], [Kyoto Tycoon][], [Scalaris][], [Neo4j][]和[HBase][]:

##最流行的 NoSQL

###MongoDB 2.2版

**开发语言：** C++

**主要特性：** 保留 SQL 中一些用户友好的特性（查询、索引等）。

**许可证：** AGPL (发起者: Apache)

**数据传输、存储的格式：** 自定义，二进制（ BSON 文档格式）

- 主/从备份（支持自动故障切换功能）
- 自带数据分片功能
- 通过 javascript 表达式提供数据查询
- 服务器端完全支持 javascript 脚本
- 比 CouchDB 更好的升级功能
- 数据存储使用内存映射文件技术
- 功能丰富，性能不俗
- 最好开启日志功能（使用 --journal 参数）
- 在 32 位系统中，内存限制在 2.5GB
- 空数据库占用 192MB 空间
- 使用 GridFS（不是真正的文件系统）来保存大数据和元数据
- 支持对数据建立索引
- 数据中心意识

**应用场景：**动态查询；需要定义索引而不是 map/reduce 功能；提高大数据库性能；想使用 CouchDB 但数据的 IO 吞吐量太大，CouchDB 无法满足要求。MongoDB 可以满足你的需求。

**使用案例：**想布署 MySQL 或 PostgreSQL，但它们存在的预定义处理语句和预定义变量让你望而却步。这个时候，MongoDB 是你可以考虑的选项。

###Riak 1.2版

**开发语言：** Erlang、C、以及一些 JavaScript

**主要特性：**容错机制（当一份数据失效，服务会自动切换到备份数据，保证服务一直在线 —— 译者注）

**许可证：** Apache

**数据传输、存储的格式：** HTTP/REST 架构，自定义二进制格式

- 可存储 BLOB（binary large object，二进制大对象，比如一张图片、一个声音文件 —— 译者注）。
- 可在分部式存储和备份存储之间作协调。
- 为了保证可验证性和安全性，Riak 在 JS 和 Erlaing 中提供提交前（pre-commit）和提交后（post-commit）钩子（hook）函数（你可以在提交数据前执行一个 hook，或者在提交数据后执行一个 hook —— 译者注）。
- JS 和 Erlang 提供映射和简化（map/reduce）编程模型。
- 使用 links 和 link walking 图形化数据库（link 用于描述对象之间的关系，link walking 是一个用于查询对象关系的进程 —— 译者注）。
- 次要标记（secondaty indeces，开发者在写数据时可用多个名称来标记一个对象 —— 译者注），一次只能用一个。
- 支持大数据对象（Luwak）（Luwak 是 Riak 中的一个服务层，为大数据量对象提供简单的、面向文档的抽象，弥补了 Riak 的 Key/Value 存储格式在处理大数据对象方面的不足 —— 译者注）。
- 提供“开源”和“企业”两个版本。
- 提供“全文搜索”（可能就是允许用户在不提供 table/volume 等信息，对一个表进行文本字段的搜索，瞎猜的，望指正 —— 译者注）。
- 正在将存储后端从“Bitcask”迁移到 Google 的“LevelDB”上。
- 企业版本提供多点备份（各点地位平等，非主从架构）和SNMP监控功能。

**应用场景：**假如你想要类似 Dynamo 的数据库，但不想要它的庞大和复杂；假如你需要良好的单点可扩展性、可用性和容错能力，但不想为多点备份买单。 Riak 能满足你的需求。

**使用案例：**销售点数据收集；工厂控制系统；必须实时在线的系统；需要易于升级的网站服务器。

###CouchDB 1.2版

**开发语言：** Erlang

**主要特性：**数据一致性；易于使用

**许可证：** Apache

**数据传输格式：** HTTP/REST

- 双向复制（一种同步技术，每个备份点都有一份它们自己的拷贝，允许用户在存储点断线的情况下修改数据，当存储节点重新上线时，CouchDB 会对所有节点同步这些修改 —— 译者注）。
- 支持持续同步或者点对点同步。
- 支持冲突检测。
- 支持主主互备（多个数据库时时同步数据，起到备份和分摊用户并行访问量的作用 —— 译者注）。
- 多版本并发控制（MVCC），写操作时不需要阻塞读操作（或者说不需要锁住数据库）。
- 向下兼容。
- 可靠的 crash-only 设计（所谓 crash-only，就是程序出错时，只需重启下程序，丢弃内存的所有数据，不需要执行复杂的数据恢复操作 —— 译者注）。
- 需要实时压缩数据。
- 视图（文档是 CouchDB 的核心概念，CouchDB 中的视图声明了如何从文档中提取数据，以及如何对提取出来的数据进行处理 —— 译者注）：内嵌映射和简化（map/reduce）编程模型。
- 格式化的views字段：lists（包含把视图运行结果转换成非 JSON 格式的方法）和 shows（包含把文档转换成非 JSON 格式的方法）（在 CouchDB 中，一个 Web 应用是与一个设计文档相对应的。在设计文档中可以包含一些特殊的字段，views 字段包含永久的视图定义 —— 译者注）。
- 可能会提供服务器端文档验证的功能。
- 可能提供身份认证功能。
- 通过 _changes 函数实时更新数据。
- 链接处理（attachment：couchDB 的每份文档都可以有一个 attachment，就像一份 email 有它的网址 —— 译者注）。
- 有个 CouchApps（第三方JS的应用）。

**应用场景：**用于随机数据量多、需要预定义查询的地方；用于版本控制比较重要的地方。

**使用案例：**可用于客户关系管理（CRM），内容管理系统（CMS）；可用于主主互备甚至多机互备。

###Redis 2.4版

**开发语言：** C/C++

**主要特性：**快到掉渣

**许可证：** BSD

**数据传输方式：** 类似 Telnet

- Redis 是一个内存数据库（in-memory database，简称 IMDB，将数据放在内存进行读写，这才是“快到掉渣”的真正原因 —— 译者注），磁盘只是提供数据持久化（即将内存的数据写到磁盘）的功能（这类数据库被称为“disk backed”数据库）。
- 当前不支持将磁盘作为 swap 分区，虚拟内存（VM）和 Diskstore 方式都没加到此版本（Redis 的数据持久化共有4种方式：定时快照、基于语句追加、虚拟内存、diskstore。其中 VM 方式由于性能不好以及不稳定的问题，已经被作者放弃，而 diskstore 方式还在实验阶段 —— 译者注）。
- 主从备份
- 存储结构为简单的 key/value 或 hash 表。
- 但是操作比较复杂，比如：ZREVRANGEBYSCORE。
- 支持 INCR（INCR key 就是将key中存储的数值加一 —— 译者注）命令（对限速和统计有帮助）。
- 支持sets数据类型（以及 union/diff/inter）。
- 支持 lists （以及 queue/blocking pop）。
- 支持 hash sets （多级对象）。
- 支持 sorted sets（高效率的表，在范围查找方面有优势）。
- 支持事务处理。
- 缓存中的数据可被标记为过期
- Pub/Sub 操作能让用户发送信息。

**应用场景：**适合布署快速多变的小规模数据（可以完全运行在存在中）。

**使用案例：**股价系统、分析系统、实时数据收集系统、实时通信系统、以及取代 memcached。

##Clones of Google's Bigtable

###HBase (V0.92.0)

**Written in:** Java

**Main point:** Billions of rows X millions of columns

**License:** Apache

**Protocol:** HTTP/REST (also Thrift)

- Modeled after Google's BigTable
- Uses Hadoop's HDFS as storage
- Map/reduce with Hadoop
- Query predicate push down via server side scan and get filters
- Optimizations for real time queries
- A high performance Thrift gateway
- HTTP supports XML, Protobuf, and binary
- Jruby-based (JIRB) shell
- Rolling restart for configuration changes and minor upgrades
- Random access performance is like MySQL
- A cluster consists of several different types of nodes

**Best used:** Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

**For example:** Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

###Cassandra (1.2)

**Written in:** Java

**Main point:** Best of BigTable and Dynamo

**License:** Apache

**Protocol:** Thrift & custom binary CQL3

- Tunable trade-offs for distribution and replication (N, R, W)
- Querying by column, range of keys (Requires indices on anything that you want to search on)
- BigTable-like features: columns, column families
- Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)
- Data can have expiration (set on INSERT)
- Writes can be much faster than reads (when reads are disk-bound)
- Map/reduce possible with Apache Hadoop
- All nodes are similar, as opposed to Hadoop/HBase
- Very good and reliable cross-datacenter replication

**Best used:** When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")

**For example:** Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis.

###Hypertable (0.9.6.5)

**Written in:** C++

**Main point:** A faster, smaller HBase

**License:** GPL 2.0

**Protocol:** Thrift, C++ library, or HQL shell

- Implements Google's BigTable design
- Run on Hadoop's HDFS
- Uses its own, "SQL-like" language, HQL
- Can search by key, by cell, or for values in column families.
- Search can be limited to key/column ranges.
- Sponsored by Baidu
- Retains the last N historical values
- Tables are in namespaces
- Map/reduce with Hadoop

**Best used:** If you need a better HBase.

**For example:** Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

###Accumulo (1.4)

**Written in:** Java and C++

**Main point:** A BigTable with Cell-level security

**License:** Apache

**Protocol:** Thrift

- Another BigTable clone, also runs of top of Hadoop
- Cell-level security
- Bigger rows than memory are allowed
- Keeps a memory map outside Java, in C++ STL
- Map/reduce using Hadoop's facitlities (ZooKeeper & co)
- Some server-side programming

**Best used:** If you need a different HBase.

**For example:** Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

##Special-purpose

###Neo4j (V1.5M02)

**Written in:** Java

**Main point:** Graph database - connected data

**License:** GPL, some features AGPL/commercial

**Protocol:** HTTP/REST (or embedding in Java)

- Standalone, or embeddable into Java applications
- Full ACID conformity (including durable data)
- Both nodes and relationships can have metadata
- Integrated pattern-matching-based query language ("Cypher")
- Also the "Gremlin" graph traversal language can be used
- Indexing of nodes and relationships
- Nice self-contained web admin
- Advanced path-finding with multiple algorithms
- Indexing of keys and relationships
- Optimized for reads
- Has transactions (in the Java API)
- Scriptable in Groovy
- Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

**Best used:** For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

**For example:** For searching routes in social relations, public transport links, road maps, or network topologies.

###ElasticSearch (0.20.1)

**Written in:** Java

**Main point:** Advanced Search

**License:** Apache

**Protocol:** JSON over HTTP (Plugins: Thrift, memcached)

- Stores JSON documents
- Has versioning
- Parent and children documents
- Documents can time out
- Very versatile and sophisticated querying, scriptable
- Write consistency: one, quorum or all
- Sorting by score (!)
- Geo distance sorting
- Fuzzy searches (approximate date, etc) (!)
- Asynchronous replication
- Atomic, scripted updates (good for counters, etc)
- Can maintain automatic "stats groups" (good for debugging)
- Still depends very much on only one developer (kimchy).

**Best used:** When you have objects with (flexible) fields, and you need "advanced search" functionality.

**For example:** A dating service that handles age difference, geographic location, tastes and dislikes, etc. Or a leaderboard system that depends on many variables.

##The "long tail"

(Not widely known, but definitely worthy ones)

###Couchbase (ex-Membase) (2.0)

**Written in:** Erlang & C

**Main point:** Memcache compatible, but with persistence and clustering

**License:** Apache

**Protocol:** memcached + extensions

- Very fast (200k+/sec) access of data by key
- Persistence to disk
- All nodes are identical (master-master replication)
- Provides memcached-style in-memory caching buckets, too
- Write de-duplication to reduce IO
- Friendly cluster-management web GUI
- Connection proxy for connection pooling and multiplexing (Moxi)
- Incremental map/reduce
- Cross-datacenter replication

**Best used:** Any application where low-latency data access, high concurrency support and high availability is a requirement.

**For example:** Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).

###VoltDB (2.8.4.1)

**Written in:** Java

**Main point:** Fast transactions and rapidly changing data

**License:** GPL 3

**Protocol:** Proprietary

- In-memory relational database.
- Can export data into Hadoop
- Supports ANSI SQL
- Stored procedures in Java
- Cross-datacenter replication

**Best used:** Where you need to act fast on massive amounts of incoming data.

**For example:** Point-of-sales data analysis. Factory control systems.

###Scalaris (0.5)

**Written in:** Erlang

**Main point:** Distributed P2P key-value store

**License:** Apache

**Protocol:** Proprietary & JSON-RPC

- In-memory (disk when using Tokyo Cabinet as a backend)
- Uses YAWS as a web server
- Has transactions (an adapted Paxos commit)
- Consistent, distributed write operations
- From CAP, values Consistency over Availability (in case of network partitioning, only the bigger partition - works)

**Best used:** If you like Erlang and wanted to use Mnesia or DETS or ETS, but you need something that is accessible from more languages (and scales much better than ETS or DETS).

**For example:** In an Erlang-based system when you want to give access to the DB to Python, Ruby or Java programmers.

###Kyoto Tycoon (0.9.56)

**Written in:** C++

**Main point:** A lightweight network DBM

**License:** GPL

**Protocol:** HTTP (TSV-RPC or REST)

- Based on Kyoto Cabinet, Tokyo Cabinet's successor
- Multitudes of storage backends: Hash, Tree, Dir, etc (everything from Kyoto Cabinet)
- Kyoto Cabinet can do 1M+ insert/select operations per sec (but Tycoon does less because of overhead)
- Lua on the server side
- Language bindings for C, Java, Python, Ruby, Perl, Lua, etc
- Uses the "visitor" pattern
- Hot backup, asynchronous replication
- background snapshot of in-memory databases
- Auto expiration (can be used as a cache server)

**Best used:** When you want to choose the backend storage algorithm engine very precisely. When speed is of the essence.

**For example:** Caching server. Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before.

Of course, all these systems have much more features than what's listed here. I only wanted to list the key points that I base my decisions on. Also, development of all are very fast, so things are bound to change.

P.s.: And no, there's no date on this review. There are version numbers, since I update the databases one by one, not at the same time. And believe me, the basic properties of databases don't change that much.

---

via: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

本文由 [LCTT][] 原创翻译，[Linux中国][] 荣誉推出

译者：[译者ID][] 校对：[校对者ID][]

[LCTT]:https://github.com/LCTT/TranslateProject
[Linux中国]:http://linux.cn/portal.php
[chenjintao]:http://linux.cn/space/chenjintao
[校对者ID]:http://linux.cn/space/校对者ID

[Cassandra]:http://cassandra.apache.org/
[Mongodb]:http://www.mongodb.org/
[CouchDB]:http://couchdb.apache.org/
[Redis]:http://redis.io/
[Riak]:http://basho.com/riak/
[Couchbase (ex-Membase)]:http://www.couchbase.org/membase
[Hypertable]:http://hypertable.org/
[ElasticSearch]:http://www.elasticsearch.org/
[Accumulo]:http://accumulo.apache.org/
[VoltDB]:http://voltdb.com/
[Kyoto Tycoon]:http://fallabs.com/kyototycoon/
[Scalaris]:https://code.google.com/p/scalaris/
[Neo4j]:http://neo4j.org/
[HBase]:http://hbase.apache.org/