translated

This commit is contained in:
KayGuoWhu 2015-03-22 22:26:40 +08:00
parent 3541cca48a
commit 3794ad3136
2 changed files with 86 additions and 82 deletions

View File

@ -1,82 +0,0 @@
[translating by KayGuoWhu]
The future of Linux storage
================================================================================
> **Summary**:Linux software developers are working hard on expanding Linux's file and storage options.
BOSTON - At the [Linux Foundation's][1] new [Vault][2] show, it's all about file systems and storage. You might think that there's nothing new to say about either topic, but you'd be wrong.
![](http://zdnet2.cbsistatic.com/hub/i/r/2015/03/12/c8f92cc2-b963-4238-80a0-d785ec93698c/resize/770x578/08d93a8a393d3f50b2a56e6b0e7a0ca9/btrfs-1.jpg)
Linux file systems, such as Btrfs, and storage support options are constantly evolving. -- Facebook
Storage technology has come a long way from the days of, as Linus Torvalds put it, "[nasty platters of spinning rust][3]" and Linux has had to keep up. In recent years, for example, [flash memory has arrived as enterprise server primary storage][4] and [persistent memory][5] is bringing us storage that works at DRAM speeds. At the same time, Big Data, cloud computing, and containers are all bringing new use cases to Linux.
To deal with this, Linux developers are both expanding their existing file and storage programs and working on new ones.
### Btrfs ###
For instance, Chris Mason, a Facebook software engineer and one of the [Btrfs][6] (pronounced Butter FS) maintainers, explained how Facebook uses this file system. Btrfs has many advantages as a file system such as the ability to handle both numerous small files and single files as large as 16 exabytes; baked in RAID; built-in file-system compression; and integrated multi-storage device support.
Facebook, of course, runs on Linux. To be exact, Facebook runs the 3.10 and 3.18 Linux kernels on an internal distribution, which is based on [CentOS][7]. For Facebook, the real win is that Btrfs is stable and fast under the endless input/output operations per second (IOPS) pounding from Facebook's constantly updating users.
That's the good news. The bad news is that Btrfs is still much too slow for traditional DBMSs such as MySQL. For those, Facebook uses [XFS][8]. To co-ordinate the two file systems, Facebook uses [Gluster][9], the open-source distributed file system.
Facebook, which works hand-in-glove with the upstream Btrfs Linux kernel developers, is working on improving Btrfs's DBMS speed. Mason, and his companions, are doing this by using Btrfs with the [RocksDB][10] database. This is a persistent key-value store for fast storage, which can be used as the foundation for a client-server database.
Btrfs also still has some bugs. For example, if you're foolish enough to fill a disk almost to bursting, Btrfs will stop you from writing to storage before the disk is completely stuffed. For some projects, such as [CoreOS][12], the enterprise Linux that relies on containers, that's a showstopper. [CoreOS has since switched to using xt4 and overlayfs][11].
The Btrfs crew is also working on data deduplication. In this, when a file system has more than one identical file, you automatically delete the duplicate. As Mason said, "Not everyone needs this, but if you need it, you really need it!"
Btrfs isn't the only file system that's both very important and getting worked on. John Spray, a senior software engineer at [Red Hat][13], talked about the distributed [Ceph][14] file system.
### Ceph FS ###
Ceph provides a distributed object store and file system which, in turn, relies on a resilient and scalable storage model (RADOS) using clusters of commodity hardware. Along with the RADOS block device (RBD), and the RADOS object gateway (RGW), Ceph provides a [POSIX][15] file-system interface -- Ceph FS. While RBD and RGW have been in use for production workloads for some time, efforts to make Ceph FS ready for production are now underway.
[Red Hat, after acquiring Inktank][16], Ceph's parent company, in 2014 has been working hard on making CephFS production ready. For better or worse, Spray said, "Some people are already using it in production; we're terrified of this. It's really not ready yet." Still, Spray added, that this "is a mixed blessing because while it's a bit scary, we get really useful feedback and testing from those users."
That's because while Ceph object stores scale out well, Ceph FS, as a POSIX compliant file-system, are hard to scale out. For example, as a distributed file system, Ceph FS has to deal with multiple writes from multiple clients. This can lead to all or nothing situations where one client can write and others must wait. This can result in file-locking situations that are more complicated than those in ordinary file systems.
Still, Ceph FS is worth doing, Spray said, "since POSIX file-systems are an operating system lingua franca." That's not to say that Ceph FS doesn't work. "It's not horribly broken. It works. What's missing is the repair and monitoring tools."
Red Hat is currently hard at work on getting [fsck][17] and journal repair tools, snapshot hardening, better client access control, and cloud and container integration. For now, though, Ceph FS is a file system that only the very brave, or foolish, should use in production.
### File and storage odds and ends ###
As for larger issues of file-systems and storage, Jeff Layton, senior software engineer at [Primary Data][18], explained that there are efforts under way to to create "tests for catastrophic power failure, without actually pulling the plug." These tests will soon be integrated with [xfstests][19], the gold standard for Linux file-system testing.
Rik van Riel, a Red Hat principal software engineer, spoke about the problem of dealing with persistent memory products. You can treat them as storage or as memory. But, you can't currently take snapshots for backups if you use them as memory. The real problem: van Riel is certain that people will try to use persistent memory as both, which will lead to such as situations as "Without back up, how do you deal with a 200GB persistent memory database?" Adding insult to injury, logging systems don't currently work with persistent memory.
What's the right answer? Linux doesn't have one yet, but programmers are working on it.
So, while Linux has many file systems and can use any kind of storage out there that can hold a byte, there's still a lot of work to be done. Technology never stands still. Linux, which runs on everything from devices to desktops to servers to clouds to supercomputers, has to keep up with storage advances no matter where they appear.
--------------------------------------------------------------------------------
via: http://www.zdnet.com/article/linux-storage-futures/
作者:[Steven J. Vaughan-Nichols][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[a]:http://www.zdnet.com/meet-the-team/us/sjvn/
[1]:http://www.linuxfoundation.org/
[2]:http://events.linuxfoundation.org/events/vault
[3]:http://www.wired.com/2012/10/linus-torvalds-hard-disks/
[4]:http://www.zdnet.com/article/sandisk-launches-infiniflash-aims-to-bring-flash-array-costs-down/
[5]:http://events.linuxfoundation.org/sites/events/files/eeus13_wheeler.pdf
[6]:https://btrfs.wiki.kernel.org/index.php/Main_Page
[7]:http://www.centos.org/
[8]:http://oss.sgi.com/projects/xfs/
[9]:http://www.gluster.org/
[10]:http://rocksdb.org/
[11]:http://lwn.net/Articles/627232/
[12]:https://coreos.com/
[13]:http://www.redhat.com/
[14]:http://ceph.com/
[15]:http://pubs.opengroup.org/onlinepubs/9699919799/
[16]:http://www.zdnet.com/article/red-hat-acquires-inktank-for-175m/
[17]:http://linux.die.net/man/8/fsck
[18]:http://primarydata.com/
[19]:http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=summary

View File

@ -0,0 +1,86 @@
Linux存储的未来
================================================================================
> **摘要**Linux系统的软件开发者们正致力于使Linux支持更多种类的文件和存储方案。
波士顿 - 在[Linux基金会][1]最近的[Vault][2]展示会上,全都是关于文件系统和存储方案的讨论。你可以会想关于这两个主题并没有什么展值得讨论的最新进展,但事实并非如此。
![](http://zdnet2.cbsistatic.com/hub/i/r/2015/03/12/c8f92cc2-b963-4238-80a0-d785ec93698c/resize/770x578/08d93a8a393d3f50b2a56e6b0e7a0ca9/btrfs-1.jpg)
对Linux文件系统比如Btrfs和存储方案的支持正在持续发展中。 -- Facebook
自从Linus提出“[讨厌的、生锈的机械磁盘]”的观点以来存储技术已经走过一段长路Linux也始终保持跟进。比如说近几年来[闪存已经逐渐成为企业服务器的主要存储器][4][持久化内存][5]也正给我们带来拥有DRAM一般快速的存储。与此同时大数据、云计算和容器化技术正给Linux引入新的应用场景。
为了应对挑战Linux开发者们一边继续扩展已有的文件系统和存储程序一边致力于开发新的方案。
### Btrfs ###
例如Chris Mason一位来自Facebook的软件工程师也是[Btrfs][6]对外宣称Butter FS的维护者之一说明了Facebook是如何使用这种文件系统。Btrfs拥有文件系统固有的许多优点比如既能处理大量的小文件也能处理大小可达16EB的单个文件支持RAID的baked烦请校正补充内置的文件系统压缩以及集成了对多种存储设备的支持。
当然Facebook的服务器也运行在Linux上。更准确地讲是运行在一个基于[CentOS][7]的内部发行版上它是基于3.10和3.18版的内核。对Facebook来说真正的收获是Btrfs在由Facebook持续的更新用户操作带来的巨大的IOPS每秒钟输入输出的操作数的负载下依旧保持稳定和快速。
这就是好消息但坏消息是对于像MySQL一样的传统DBMS数据库管理系统来说Btrfs还是太慢了。对此Facebook采用了[XFS][8]。为了协同这两种文件系统Facebook又用到了一种叫做[Gluster][9]的开源分布式文件系统。
Facebook一直与上游的负责Btrfs的Linux内核开发者保持密切联系致力于提高Btrfs在DBMS上的速度。Mason和他的同事在[RocksDB][10]数据库上使用Btrfs以达成目标RocksDB是一种为提供快速存储开发的持久化键值存储系统可以作为客户端服务器模式数据库的基础部分。
当然Btrfs也还存在一些问题比如如果有用户傻到用数据把硬盘几乎要撑爆时Btrfs会在硬盘被完全装满前阻止用户继续写入。对某些工程来说比如[CoreOS][12]一款依赖容器化的企业版Linux系统这种问题是致命的。[因此CoreOS已经切换到使用xt4和overlayfs了][11]。
Btrfs的开发人员正致力于数据去重。在这一点上当文件系统中拥有超过一个的相同文件时会自动删除多余文件。正如Mason所说“并非每个人都需要这个功能但如果有人需要那就是真的需要
在正在开展的重要性工作中Btrfs并非是唯一的文件系统。John Spary[Red Hat][13]的一位高级软件工程师,提到了另一款名为[Ceph][14]的分布式文件系统。
### Ceph FS ###
Ceph提供了一种分布式对象存储方案和文件系统反过来它依托于一种使用商用硬件集群的弹性的、可扩展的存储模型RADOS。配合RADOS块设备RBD和RADOS对象网关RGWCeph提供了一种[POSIX][15]接口的文件系统 -- Ceph FS。尽管RBD和RGW已经在生产环境中使用了一段时间但使Ceph FS适用于生产的工作还是进行中。
[Rad Hat在收购Ceph的母公司Inktank后][16]在2014年一直致力于使CephFS适用于生产环境。不管怎样Spray说“有些人已经在生产中使用了它我们对此表示担忧毕竟它还没有准备好。”然而Spray也补充说“这具有两面性因为一方面这是让人担心的另一方面我们又从用户获得了真正有用的反馈和测试。”
这是因为尽管Ceph对象存储很好地支持扩展但Ceph Fs作为一种兼容POSIX的文件系统却很难实现扩展。比如作为一种分布式文件系统Ceph FS必须解决来自多个客户端的多个写操作。这会导致全有或全无的情况即一个客户端可以写入但其它客户端必须等待也会产生文件加锁的情形即相比普通文件系统中更加复杂。
但是Ceph FS仍值得去做正如Spray所说“因为兼容POSIX的文件系统是操作系统通用的。”这并不是说Ceph FS就一无是处。“它并不是支离破碎的相反它奏效了。所缺的是修复和监控工具。”
Red Hat目前正致力于获得[fsck][17]和日志修复工具、快照强化、更好客户端访问控制以及云与容器的集成。尽管Ceph FS到目前为止只是一种有潜力或者没前景的文件系统但仍然值得用在生产环境中。
### 文件与存储的差别与目标 ###
至于文件系统和存储上的更大问题Jeff Layton[Primary Data][18]的一位高级软件工程师,解释说为了“在不断开电源的情况下给灾难性的电源故障提供测试”,大量的相关工作正在进行中。这些测试很快会被集成到[xftests][19]中它是Linux文件系统测试的黄金标准。
Rik van Riel一位Red Hat的主要软件工程师谈到了解决持久化内存产品的问题。你可以把它们作为存储器或者内存。但是如果你现在把它们作为内存来用是不能为备份创建快照。真正的问题是van Riel确信人们会尝试使用持久的内存作这两种用途这会导致出现和“如果不备份你会如何处理一个200GB大小的持久化内存数据库”类似的情形发生。更糟的是现在日志系统也无法和持久化的内存一起发挥作用。
正确的答案是什么呢Linux至今还没有一个但编程人员们正在努力寻找答案。
因此尽管Linux支持很多文件系统可以使用这里以外的任何一种存储器来存储数据但是仍然有很多工作要做。技术从来不会止步不前。Linux正运行在移动设备、桌面电脑、服务器、云端和超级计算机上等几乎所有的主流设备上必须跟紧存储的发展步伐不管它们以何种形式出现。
--------------------------------------------------------------------------------
via: http://www.zdnet.com/article/linux-storage-futures/
作者:[Steven J. Vaughan-Nichols][a]
译者:[KayGuoWhu](https://github.com/KayGuoWhu)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[a]:http://www.zdnet.com/meet-the-team/us/sjvn/
[1]:http://www.linuxfoundation.org/
[2]:http://events.linuxfoundation.org/events/vault
[3]:http://www.wired.com/2012/10/linus-torvalds-hard-disks/
[4]:http://www.zdnet.com/article/sandisk-launches-infiniflash-aims-to-bring-flash-array-costs-down/
[5]:http://events.linuxfoundation.org/sites/events/files/eeus13_wheeler.pdf
[6]:https://btrfs.wiki.kernel.org/index.php/Main_Page
[7]:http://www.centos.org/
[8]:http://oss.sgi.com/projects/xfs/
[9]:http://www.gluster.org/
[10]:http://rocksdb.org/
[11]:http://lwn.net/Articles/627232/
[12]:https://coreos.com/
[13]:http://www.redhat.com/
[14]:http://ceph.com/
[15]:http://pubs.opengroup.org/onlinepubs/9699919799/
[16]:http://www.zdnet.com/article/red-hat-acquires-inktank-for-175m/
[17]:http://linux.die.net/man/8/fsck
[18]:http://primarydata.com/
[19]:http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=summary