mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-26 21:30:55 +08:00
Merge pull request #1435 from KayGuoWhu/master
[Translated]20140724 diff -u--What is New in Kernel Development.md
This commit is contained in:
commit
2301919bfd
@ -1,46 +0,0 @@
|
||||
[translating by KayGuoWhu]
|
||||
diff -u: What's New in Kernel Development
|
||||
================================================================================
|
||||
Once in a while someone points out a POSIX violation in Linux. Often the answer is to fix the violation, but sometimes Linus Torvalds decides that the POSIX behavior is broken, in which case they keep the Linux behavior, but they might build an additional POSIX compatibility layer, even if that layer is slower and less efficient.
|
||||
|
||||
This time, *Michael Kerrisk* reported a POSIX violation that affected file operations. Apparently, reading and writing to files during multithreaded operations could hit race conditions and overwrite each other's changes.
|
||||
|
||||
There was some discussion over whether this was really a violation of POSIX, but ultimately, who cares? Data clobbering is bad. After Michael posted some code to reproduce the problem, the conversation focused on what to do to fix it. But Michael did make an argument that "Linux isn't consistent with UNIX since early times. (E.g., page 191 of the 1992 edition of Stevens APUE discusses the sharing of the file offset between the parent and child after fork(). Although Stevens didn't explicitly spell out the atomicity guarantee, the discussion there would be a bit nonsensical without the presumption of that guarantee.)"
|
||||
|
||||
Al Viro joined Linus in trying to come up with a fix. Linus tried introducing a simple mutex to lock files so that write operations couldn't clobber each other, and Al offered his own refinements that improved on Linus' patch.
|
||||
|
||||
At one point, Linus explained the history of the bug itself. Apparently, once upon a time the file pointer, which told the system where to write into the file, had been locked in a semaphore so only one process could do anything to it at a time. But, they took it out of the semaphore in order to accommodate device files and other non-regular files that ran into race conditions when users were barred from writing to them whenever they pleased.
|
||||
|
||||
That was what introduced the bug. At the time, it slipped through undetected, because that actual reading and writing to regular files was still handled atomically by the kernel. It was only the file pointer itself that could get out of sync. And, because high-speed threaded file operations are a pretty rare need, it took a long time for anyone to run into the problem and report it.
|
||||
|
||||
An interesting little detail is that, while Linus and Al were hunting for a fix, Al at one point complained that the approach Linus was taking wouldn't support certain architectures, including *ARM* and *PowerPC*. Linus' response was, "I doubt it's worth caring about. [...] If the ARM/PPC people end up caring, they could add the struct-return support to gcc."
|
||||
|
||||
It's always interesting to see how corner cases crop up and get dealt with. In some cases, part of the fix has to happen in the kernel, part in GCC and part elsewhere. In this particular instance, Al felt the whole thing could be done in the kernel, and he was inspired to write his own version of the patch, which Linus accepted.
|
||||
|
||||
*Andi Kleen* wanted to add low-level CPU event support to *perf*. The problem was that there could be tons of low-level events, and it varied widely from CPU to CPU. Even storing the possible events in memory for all CPUs would significantly increase the kernel's running size. So, hard-coding this information into the kernel would be problematic.
|
||||
|
||||
He pointed out that the *OProfile* tool relied on publicly available lists of these events, though he said the OProfile developers didn't always keep their lists up to date with the latest available versions.
|
||||
|
||||
To solve these issues, Andi submitted a patch that allowed perf to identify which event-list was needed for the particular CPU on the given system, and automatically download the latest version of that list from its home location. Then perf could interpret the list and analyze the events, without overburdening the kernel.
|
||||
|
||||
There was various feedback to Andi's code, mostly to do with which directory should house the event-lists, and what the filenames should be called. The behavior of the code itself seemed to get a good reception. One detail that may turn out to be more controversial than the others was Andi's decision to download the lists to a subdirectory of the user's own home directory. Andi said that otherwise users might be encouraged to download the event-lists as the root user, which would be bad security practice.
|
||||
|
||||
Sasha Levin recently posted a script to translate the *hexadecimal offsets *from stack dumps into meaningful line numbers that pointed into the kernel's source files. So something like "ffffffff811f0ec8" might be translated into "fs/proc/generic.c:445".
|
||||
|
||||
However, it turned out that Linus Torvalds was planning to remove the hex offsets from the stack dumps for exactly the reason that they were unreadable. So Sasha's code was about to go out of date.
|
||||
|
||||
They went back and forth a bit on it. At first Sasha decided to rely on data stored in the System.map file to compensate, but Linus pointed out that some people, including him, didn't keep their System.map file around. Linus recommended using /usr/bin/nm to extract the symbols from the compiled kernel files.
|
||||
|
||||
So, it seems as though Sasha's script may actually provide meaningful file and line numbers for debugging stack dumps, assuming the stack dumps provide enough information to do the calculations.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: http://www.linuxjournal.com/content/diff-u-whats-new-kernel-development-0
|
||||
|
||||
原文作者:[Zack Brown][a]
|
||||
|
||||
译者:[译者ID](https://github.com/译者ID) 校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:http://www.linuxjournal.com/user/801501
|
@ -0,0 +1,44 @@
|
||||
diff -u: 内核开发里的新鲜事儿
|
||||
================================================================================
|
||||
偶尔总会有人指出Linux中的POSIX违规(注:violation此处暂译为违规,若不妥,请修正),通常的回答是修复违规问题,但有时李纳斯·托瓦兹认为POSIX特性是不完整的,至少他们维护Linux特性的情形下是这样的。因此,他们或许应该构建一层POSIX兼容层,即便这个分层会相对较慢和低效。
|
||||
|
||||
这一次,*迈克尔·凯利斯克*报告了一个影响文件操作的POSIX违规。显然,在多线程操作期间读写文件会导致竞争出现,重写其它操作的改变。
|
||||
|
||||
关于这是否是POSIX的一个违规存在一些讨论,但到最后又有谁关心呢?数据重写是很糟糕的。在迈克尔提交部分代码去重现这个问题后,讨论的问题集中到该做什么去修复它。但迈克尔确实提出了“Linux从早期开始就与UNIX不一致。(如在1992年版的史蒂夫的APUE的191页讨论到fork()操作后在父进程与子进程之间文件偏移量的共享问题。尽管史蒂夫没有显式地讲清楚一致性的保证,但缺乏这个保证的推论这里的讨论可能有些没意义。)”的观点。
|
||||
|
||||
艾尔·维洛和李纳斯一起设法解决这个修复。李纳斯尝试引入一个简单的互斥量去锁住文件,以便写操作无法互相重写。艾尔提出了自己的改进以改善李纳斯的补丁。
|
||||
|
||||
李纳斯一度解释过这个故障自身的历史。显然,从前这个用来告诉系统去哪里写文件的文件指针已经被锁在一个信号量中,所以只有一个进程可以在某一时刻对这个文件做任何操作。但是,他们从中拿走了这个信号量,以便在任何时候可以适应设备文件和其它非常规文件,因为当用户被禁止写入其中时它们就会陷入竞争状态。
|
||||
|
||||
这就是错误的由来。那时候,它悄悄通过了检查,未被发现。因为实际上对常规文件的读写仍然由内核自动处理。只有文件指针自身可以避免同步。而且,因为高速线程化的文件操作是一个非常罕见的需求,所以对任何人来说都需要很长时间才能遇到这个问题并报告它。
|
||||
|
||||
一个有趣的小细节是当李纳斯和艾尔在寻找一个修复方案时,艾尔一度抱怨李纳斯采用的方法并不能支持确定的架构,包括*ARM*和*PowerPC*。李纳斯的回应是“我怀疑关心这个是否有意义。[...]如果使用ARM/PPC架构的人停止抱怨,他们可以往gcc中加入struct-return的支持。”
|
||||
|
||||
看到这些问题突然产生并得到处理通常是很有趣的。在某些情况下,这个修复的部分工作必须在内核中进行,部分在GCC中,部分在其它地方。在这个特例里,艾尔认为整个事情都应该在内核里处理,他在灵感的激发下往补丁中写入了自己的版本,李纳斯也接受了。
|
||||
|
||||
*安迪·克伦*则想为*perf*增加底层CPU事件支持。问题在于这可能会导致大量的底层事件,而且会因CPU的变化而改变。即使为了所有类型的CPU把可能的时间都存储在内存里,也可能会显著地增加内核的运行大小。因此,把这个信息硬编码进内核的方法是有问题的。
|
||||
|
||||
他也指出*OProfile*工具依赖于这些时间的公开可用列表,尽管他表示OProfile开发者并非总维持他们的列表与最新的可用版本一致。
|
||||
|
||||
为了解决这些问题,安迪提交了一个补丁,允许perf识别在给定的系统上为特定的CPU需要那种事件列表,并自动从起始位置下载这个列表的最新版本。然后perf可以解释这个列表并分析其中的事件,不会使内核负载过重。
|
||||
|
||||
有各种各样对安迪代码的反馈,其中大部分涉及到应该在哪个目录下保存事件列表和文件如何命名。这份代码本身的特性似乎得到了很好的回应。一处细节证明了安迪的代码比其他人的更有争议,就是将列表下载到用户家目录下的一个子目录。安迪表示如果不这样做的话,用户可能会以系统管理员的身份去下载事件列表,这会是危害安全的操作。
|
||||
|
||||
萨沙·莱文最近发布了一个脚本来从堆栈转储中把*十六进制的偏移量*翻译成有意义的指向内核源码文件的行号。因此诸如“ffffffff811f0ec8”形式的十六进制表示可以被翻译成“fs/proc/generic.c:445”。
|
||||
|
||||
然而,结果表明李纳斯·托瓦兹正打算从堆栈转储中移除十六进制偏移量,具体原因是他们难以理解。所以萨沙的代码看起来过时了。【译者注:程序媛,伤不起!】
|
||||
|
||||
他们在这个问题上纠结了一番。起初,萨沙打算依赖存储在System.map文件里的数据区补偿,但李纳斯指出包括他在内的有些人并不会保留System.map文件。李纳斯推荐使用/usr/bin/nm从编译好的内核文件中提取符号表。
|
||||
|
||||
所以,似乎萨沙的脚本可能确实为调试堆栈转储提供了有意义的文件和行号,假设堆栈转储提供足够的信息去完成计算。
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: http://www.linuxjournal.com/content/diff-u-whats-new-kernel-development-0
|
||||
|
||||
原文作者:[Zack Brown][a]
|
||||
|
||||
译者:[KayGuoWhu](https://github.com/KayGuoWhu) 校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:http://www.linuxjournal.com/user/801501
|
Loading…
Reference in New Issue
Block a user