mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
translated
This commit is contained in:
parent
4c13721360
commit
969ed7edee
@ -1,103 +0,0 @@
|
|||||||
translating----geekpi
|
|
||||||
|
|
||||||
UNIX curiosities
|
|
||||||
======
|
|
||||||
Recently I've been doing more UNIXy things in various tools I'm writing, and I hit two interesting issues. Neither of these are "bugs", but behaviors that I wasn't expecting.
|
|
||||||
|
|
||||||
### Thread-safe printf
|
|
||||||
|
|
||||||
I have a C application that reads some images from disk, does some processing, and writes output about these images to STDOUT. Pseudocode:
|
|
||||||
```
|
|
||||||
for(imagefilename in images)
|
|
||||||
{
|
|
||||||
results = process(imagefilename);
|
|
||||||
printf(results);
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
The processing is independent for each image, so naturally I want to distribute this processing between various CPUs to speed things up. I usually use `fork()`, so I wrote this:
|
|
||||||
```
|
|
||||||
for(child in children)
|
|
||||||
{
|
|
||||||
pipe = create_pipe();
|
|
||||||
worker(pipe);
|
|
||||||
}
|
|
||||||
|
|
||||||
// main parent process
|
|
||||||
for(imagefilename in images)
|
|
||||||
{
|
|
||||||
write(pipe[i_image % N_children], imagefilename)
|
|
||||||
}
|
|
||||||
|
|
||||||
worker()
|
|
||||||
{
|
|
||||||
while(1)
|
|
||||||
{
|
|
||||||
imagefilename = read(pipe);
|
|
||||||
results = process(imagefilename);
|
|
||||||
printf(results);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
This is the normal thing: I make pipes for IPC, and send the child workers image filenames through these pipes. Each worker _could_ write its results back to the main process via another set of pipes, but that's a pain, so here each worker writes to the shared STDOUT directly. This works OK, but as one would expect, the writes to STDOUT clash, so the results for the various images end up interspersed. That's bad. I didn't feel like setting up my own locks, but fortunately GNU libc provides facilities for that: [`flockfile()`][1]. I put those in, and … it didn't work! Why? Because whatever `flockfile()` does internally ends up restricted to a single subprocess because of `fork()`'s copy-on-write behavior. I.e. the extra safety provided by `fork()` (compared to threads) actually ends up breaking the locks.
|
|
||||||
|
|
||||||
I haven't tried using other locking mechanisms (like pthread mutexes for instance), but I can imagine they'll have similar problems. And I want to keep things simple, so sending the output back to the parent for output is out of the question: this creates more work for both me the programmer, and for the computer running the program.
|
|
||||||
|
|
||||||
The solution: use threads instead of forks. This has a nice side effect of making the pipes redundant. Final pseudocode:
|
|
||||||
```
|
|
||||||
for(children)
|
|
||||||
{
|
|
||||||
pthread_create(worker, child_index);
|
|
||||||
}
|
|
||||||
for(children)
|
|
||||||
{
|
|
||||||
pthread_join(child);
|
|
||||||
}
|
|
||||||
|
|
||||||
worker(child_index)
|
|
||||||
{
|
|
||||||
for(i_image = child_index; i_image < N_images; i_image += N_children)
|
|
||||||
{
|
|
||||||
results = process(images[i_image]);
|
|
||||||
flockfile(stdout);
|
|
||||||
printf(results);
|
|
||||||
funlockfile(stdout);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
Much simpler, and actually works as desired. I guess sometimes threads are better.
|
|
||||||
|
|
||||||
### Passing a partly-read file to a child process
|
|
||||||
|
|
||||||
For various [vnlog][2] tools I needed to implement this sequence:
|
|
||||||
|
|
||||||
1. process opens a file with O_CLOEXEC turned off
|
|
||||||
2. process reads a part of this file (up-to the end of the legend in the case of vnlog)
|
|
||||||
3. process calls exec to invoke another program to process the rest of the already-opened file
|
|
||||||
|
|
||||||
The second program may require a file name on the commandline instead of an already-opened file descriptor because this second program may be calling open() by itself. If I pass it the filename, this new program will re-open the file, and then start reading the file from the beginning, not from the location where the original program left off. It is important for my application that this does not happen, so passing the filename to the second program does not work.
|
|
||||||
|
|
||||||
So I really need to pass the already-open file descriptor somehow. I'm using Linux (other OSs maybe behave differently here), so I can in theory do this by passing /dev/fd/N instead of the filename. But it turns out this does not work either. On Linux (again, maybe this is Linux-specific somehow) for normal files /dev/fd/N is a symlink to the original file. So this ends up doing exactly the same thing that passing the filename does.
|
|
||||||
|
|
||||||
But there's a workaround! If we're reading a pipe instead of a file, then there's nothing to symlink to, and /dev/fd/N ends up passing the original pipe down to the second process, and things then work correctly. And I can fake this by changing the open("filename") above to something like popen("cat filename"). Yuck! Is this really the best we can do? What does this look like on one of the BSDs, say?
|
|
||||||
|
|
||||||
|
|
||||||
--------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
via: http://notes.secretsauce.net/notes/2018/08/03_unix-curiosities.html
|
|
||||||
|
|
||||||
作者:[Dima Kogan][a]
|
|
||||||
选题:[lujun9972](https://github.com/lujun9972)
|
|
||||||
译者:[译者ID](https://github.com/译者ID)
|
|
||||||
校对:[校对者ID](https://github.com/校对者ID)
|
|
||||||
|
|
||||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
|
||||||
|
|
||||||
[a]:http://notes.secretsauce.net/
|
|
||||||
[1]:https://www.gnu.org/software/libc/manual/html_node/Streams-and-Threads.html
|
|
||||||
[2]:http://www.github.com/dkogan/vnlog
|
|
102
translated/tech/20180803 UNIX curiosities.md
Normal file
102
translated/tech/20180803 UNIX curiosities.md
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
UNIX 的好奇
|
||||||
|
======
|
||||||
|
最近我在用我编写的各种工具做更多 UNIX 下的事情,我遇到了两个有趣的问题。这些都不是 “bug”,而是我没想到的行为。
|
||||||
|
|
||||||
|
### 线程安全的 printf
|
||||||
|
|
||||||
|
我有一个 C 程序从磁盘读取一些图像,进行一些处理,并将有关这些图像的输出写入 STDOUT。伪代码:
|
||||||
|
```
|
||||||
|
for(imagefilename in images)
|
||||||
|
{
|
||||||
|
results = process(imagefilename);
|
||||||
|
printf(results);
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
处理对于每个图像是独立的,因此我自然希望在各个 CPU 之间分配处理以加快速度。我通常使用 `fork()`,所以我写了这个:
|
||||||
|
```
|
||||||
|
for(child in children)
|
||||||
|
{
|
||||||
|
pipe = create_pipe();
|
||||||
|
worker(pipe);
|
||||||
|
}
|
||||||
|
|
||||||
|
// main parent process
|
||||||
|
for(imagefilename in images)
|
||||||
|
{
|
||||||
|
write(pipe[i_image % N_children], imagefilename)
|
||||||
|
}
|
||||||
|
|
||||||
|
worker()
|
||||||
|
{
|
||||||
|
while(1)
|
||||||
|
{
|
||||||
|
imagefilename = read(pipe);
|
||||||
|
results = process(imagefilename);
|
||||||
|
printf(results);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
这是正常的事情:我为 IPC 创建管道,并通过这些管道发送子 worker 的图像名。每个 worker _能够_通过另一组管道将其结果写回主进程,但这很痛苦,所以每个 worker 都直接写入共享 STDOUT。这工作正常,但正如人们所预料的那样,对 STDOUT 的写入发生冲突,因此各种图像的结果最终会分散。这那很糟。我不想设置我自己的锁,但幸运的是 GNU libc 为它提供了函数:[`flockfile()`][1]。我把它们放进去了......但是没有用!为什么?因为 `flockfile()` 的内部最终因为 `fork()` 的写时复制行为而限制在单个子进程中。即 `fork()`提供的额外安全性(与线程相比),这实际上最终破坏了锁。
|
||||||
|
|
||||||
|
我没有尝试使用其他锁机制(例如 pthread 互斥锁),但我可以想象它们会遇到类似的问题。我想保持简单,所以将输出发送回父输出是不可能的:这给程序员和运行程序的计算机制造了更多的工作。
|
||||||
|
|
||||||
|
解决方案:使用线程而不是 fork。这有制造冗余管道的好的副作用。最终的伪代码:
|
||||||
|
```
|
||||||
|
for(children)
|
||||||
|
{
|
||||||
|
pthread_create(worker, child_index);
|
||||||
|
}
|
||||||
|
for(children)
|
||||||
|
{
|
||||||
|
pthread_join(child);
|
||||||
|
}
|
||||||
|
|
||||||
|
worker(child_index)
|
||||||
|
{
|
||||||
|
for(i_image = child_index; i_image < N_images; i_image += N_children)
|
||||||
|
{
|
||||||
|
results = process(images[i_image]);
|
||||||
|
flockfile(stdout);
|
||||||
|
printf(results);
|
||||||
|
funlockfile(stdout);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Much simpler, and actually works as desired. I guess sometimes threads are better.
|
||||||
|
这更简单,实际按照需要的那样工作。我猜有时线程更好。
|
||||||
|
|
||||||
|
### 将部分读取的文件传递给子进程
|
||||||
|
|
||||||
|
对于各种 [vnlog][2] 工具,我需要实现这个次序:
|
||||||
|
|
||||||
|
1. 进程打开一个关闭 O_CLOEXEC 标志的文件
|
||||||
|
2. 进程读取此文件的一部分(在 vnlog 的情况下直到图例的末尾)
|
||||||
|
3. 进程调用 exec 以调用另一个程序来处理已经打开的文件的其余部分
|
||||||
|
|
||||||
|
第二个程序可能需要命令行中的文件名而不是已打开的文件描述符,因为第二个程序可能自己调用 open()。如果我传递文件名,这个新程序将重新打开文件,然后从头开始读取文件,而不是从原始程序停止的位置开始读取。这个不会在我的程序上发生很重要,因此将文件名传递给第二个程序是行不通的。
|
||||||
|
|
||||||
|
所以我真的需要以某种方式传递已经打开的文件描述符。我在使用 Linux(其他操作系统可能在这里表现不同),所以我理论上可以通过传递 /dev/fd/N 而不是文件名来实现。但事实证明这也不起作用。在 Linux上(再说一次,也许是特定于 Linux)对于普通文件 /dev/fd/N 是原始文件的符号链接。所以这最终完成了与传递文件名完全相同的事情。
|
||||||
|
|
||||||
|
但有一个临时方案!如果我们正在读取管道而不是文件,那么没有什么可以符号链接,并且 /dev/fd/N 最终将原始管道传递给第二个进程,然后程序正常工作。我可以通过将上面的 open(“filename”)更改为 popen(“cat filename”)之类的东西来伪装。呸!这真的是我们能做的最好的吗?这在 BSD 上看上去会怎么样?
|
||||||
|
|
||||||
|
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
via: http://notes.secretsauce.net/notes/2018/08/03_unix-curiosities.html
|
||||||
|
|
||||||
|
作者:[Dima Kogan][a]
|
||||||
|
选题:[lujun9972](https://github.com/lujun9972)
|
||||||
|
译者:[geekpi](https://github.com/geekpi)
|
||||||
|
校对:[校对者ID](https://github.com/校对者ID)
|
||||||
|
|
||||||
|
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||||
|
|
||||||
|
[a]:http://notes.secretsauce.net/
|
||||||
|
[1]:https://www.gnu.org/software/libc/manual/html_node/Streams-and-Threads.html
|
||||||
|
[2]:http://www.github.com/dkogan/vnlog
|
Loading…
Reference in New Issue
Block a user