提交翻译

This commit is contained in:
wyxplus 2021-03-30 12:32:20 +08:00 committed by GitHub
parent d684fe94ea
commit 9a5a01eee9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 289 additions and 285 deletions

View File

@ -1,285 +0,0 @@
[#]: subject: (Learn how file input and output works in C)
[#]: via: (https://opensource.com/article/21/3/file-io-c)
[#]: author: (Jim Hall https://opensource.com/users/jim-hall)
[#]: collector: (lujun9972)
[#]: translator: (wyxplus)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
Learn how file input and output works in C
======
Understanding I/O can help you do things faster.
![4 manilla folders, yellow, green, purple, blue][1]
If you want to learn input and output in C, start by looking at the `stdio.h` include file. As you might guess from the name, that file defines all the standard ("std") input and output ("io") functions.
The first `stdio.h` function that most people learn is the `printf` function to print formatted output. Or the `puts` function to print a simple string. Those are great functions to print information to the user, but if you want to do more than that, you'll need to explore other functions.
You can learn about some of these functions and methods by writing a replica of a common Linux command. The `cp` command will copy one file to another. If you look at the `cp` man page, you'll see that `cp` supports a broad set of command-line parameters and options. But in the simplest case, `cp` supports copying one file to another:
```
`cp infile outfile`
```
You can write your own version of this `cp` command in C by using only a few basic functions to _read_ and _write_ files.
### Reading and writing one character at a time
You can easily do input and output using the `fgetc` and `fputc` functions. These read and write data one character at a time. The usage is defined in `stdio.h` and is quite straightforward: `fgetc` reads (gets) a single character from a file, and `fputc` puts a single character into a file.
```
int [fgetc][2](FILE *stream);
int [fputc][3](int c, FILE *stream);
```
Writing the `cp` command requires accessing files. In C, you open a file using the `fopen` function, which takes two arguments: the _name_ of the file and the _mode_ you want to use. The mode is usually `r` to read from a file or `w` to write to a file. The mode supports other options too, but for this tutorial, just focus on reading and writing.
Copying one file to another then becomes a matter of opening the source and destination files, then _reading one character at a time_ from the first file, then _writing that character_ to the second file. The `fgetc` function returns either the single character read from the input file or the _end of file_ (`EOF`) marker when the file is done. Once you've read `EOF`, you've finished copying and you can close both files. That code looks like this:
```
  do {
    ch = [fgetc][2](infile);
    if (ch != EOF) {
      [fputc][3](ch, outfile);
    }
  } while (ch != EOF);
```
You can write your own `cp` program with this loop to read and write one character at a time by using the `fgetc` and `fputc` functions. The `cp.c` source code looks like this:
```
#include <stdio.h>
int
main(int argc, char **argv)
{
  FILE *infile;
  FILE *outfile;
  int ch;
  /* parse the command line */
  /* usage: cp infile outfile */
  if (argc != 3) {
    [fprintf][4](stderr, "Incorrect usage\n");
    [fprintf][4](stderr, "Usage: cp infile outfile\n");
    return 1;
  }
  /* open the input file */
  infile = [fopen][5](argv[1], "r");
  if (infile == NULL) {
    [fprintf][4](stderr, "Cannot open file for reading: %s\n", argv[1]);
    return 2;
  }
  /* open the output file */
  outfile = [fopen][5](argv[2], "w");
  if (outfile == NULL) {
    [fprintf][4](stderr, "Cannot open file for writing: %s\n", argv[2]);
    [fclose][6](infile);
    return 3;
  }
  /* copy one file to the other */
  /* use fgetc and fputc */
  do {
    ch = [fgetc][2](infile);
    if (ch != EOF) {
      [fputc][3](ch, outfile);
    }
  } while (ch != EOF);
  /* done */
  [fclose][6](infile);
  [fclose][6](outfile);
  return 0;
}
```
And you can compile that `cp.c` file into a full executable using the GNU Compiler Collection (GCC):
```
`$ gcc -Wall -o cp cp.c`
```
The `-o cp` option tells the compiler to save the compiled program into the `cp` program file. The `-Wall` option tells the compiler to turn on all warnings. If you don't see any warnings, that means everything worked correctly.
### Reading and writing blocks of data
Programming your own `cp` command by reading and writing data one character at a time does the job, but it's not very fast. You might not notice when copying "everyday" files like documents and text files, but you'll really notice the difference when copying large files or when copying files over a network. Working on one character at a time requires significant overhead.
A better way to write this `cp` command is by reading a chunk of the input into memory (called a _buffer_), then writing that collection of data to the second file. This is much faster because the program can read more of the data at one time, which requires fewer "reads" from the file.
You can read a file into a variable by using the `fread` function. This function takes several arguments: the array or memory buffer to read data into (`ptr`), the size of the smallest thing you want to read (`size`), how many of those things you want to read (`nmemb`), and the file to read from (`stream`):
```
`size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);`
```
The different options provide quite a bit of flexibility for more advanced file input and output, such as reading and writing files with a certain data structure. But in the simple case of _reading data from one file_ and _writing data to another file_, you can use a buffer that is an array of characters.
And you can write the buffer to another file using the `fwrite` function. This uses a similar set of options to the `fread` function: the array or memory buffer to read data from, the size of the smallest thing you need to write, how many of those things you need to write, and the file to write to.
```
`size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);`
```
In the case where the program reads a file into a buffer, then writes that buffer to another file, the array (`ptr`) can be an array of a fixed size. For example, you can use a `char` array called `buffer` that is 200 characters long.
With that assumption, you need to change the loop in your `cp` program to _read data from a file into a buffer_ then _write that buffer to another file_:
```
  while (![feof][7](infile)) {
    buffer_length = [fread][8](buffer, sizeof(char), 200, infile);
    [fwrite][9](buffer, sizeof(char), buffer_length, outfile);
  }
```
Here's the full source code to your updated `cp` program, which now uses a buffer to read and write data:
```
#include <stdio.h>
int
main(int argc, char **argv)
{
  FILE *infile;
  FILE *outfile;
  char buffer[200];
  size_t buffer_length;
  /* parse the command line */
  /* usage: cp infile outfile */
  if (argc != 3) {
    [fprintf][4](stderr, "Incorrect usage\n");
    [fprintf][4](stderr, "Usage: cp infile outfile\n");
    return 1;
  }
  /* open the input file */
  infile = [fopen][5](argv[1], "r");
  if (infile == NULL) {
    [fprintf][4](stderr, "Cannot open file for reading: %s\n", argv[1]);
    return 2;
  }
  /* open the output file */
  outfile = [fopen][5](argv[2], "w");
  if (outfile == NULL) {
    [fprintf][4](stderr, "Cannot open file for writing: %s\n", argv[2]);
    [fclose][6](infile);
    return 3;
  }
  /* copy one file to the other */
  /* use fread and fwrite */
  while (![feof][7](infile)) {
    buffer_length = [fread][8](buffer, sizeof(char), 200, infile);
    [fwrite][9](buffer, sizeof(char), buffer_length, outfile);
  }
  /* done */
  [fclose][6](infile);
  [fclose][6](outfile);
  return 0;
}
```
Since you want to compare this program to the other program, save this source code as `cp2.c`. You can compile that updated program using GCC:
```
`$ gcc -Wall -o cp2 cp2.c`
```
As before, the `-o cp2` option tells the compiler to save the compiled program into the `cp2` program file. The `-Wall` option tells the compiler to turn on all warnings. If you don't see any warnings, that means everything worked correctly.
### Yes, it really is faster
Reading and writing data using buffers is the better way to write this version of the `cp` program. Because it reads chunks of a file into memory at once, the program doesn't need to read data as often. You might not notice a difference in using either method on smaller files, but you'll really see the difference if you need to copy something that's much larger or when copying data on slower media like over a network connection.
I ran a runtime comparison using the Linux `time` command. This command runs another program, then tells you how long that program took to complete. For my test, I wanted to see the difference in time, so I copied a 628MB CD-ROM image file I had on my system.
I first copied the image file using the standard Linux `cp` command to see how long that takes. By running the Linux `cp` command first, I also eliminated the possibility that Linux's built-in file-cache system wouldn't give my program a false performance boost. The test with Linux `cp` took much less than one second to run:
```
$ time cp FD13LIVE.iso tmpfile
real    0m0.040s
user    0m0.001s
sys     0m0.003s
```
Copying the same file using my own version of the `cp` command took significantly longer. Reading and writing one character at a time took almost five seconds to copy the file:
```
$ time ./cp FD13LIVE.iso tmpfile
real    0m4.823s
user    0m4.100s
sys     0m0.571s
```
Reading data from an input into a buffer and then writing that buffer to an output file is much faster. Copying the file using this method took less than a second:
```
$ time ./cp2 FD13LIVE.iso tmpfile
real    0m0.944s
user    0m0.224s
sys     0m0.608s
```
My demonstration `cp` program used a buffer that was 200 characters. I'm sure the program would run much faster if I read more of the file into memory at once. But for this comparison, you can already see the huge difference in performance, even with a small, 200 character buffer.
--------------------------------------------------------------------------------
via: https://opensource.com/article/21/3/file-io-c
作者:[Jim Hall][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/jim-hall
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/file_system.jpg?itok=pzCrX1Kc (4 manilla folders, yellow, green, purple, blue)
[2]: http://www.opengroup.org/onlinepubs/009695399/functions/fgetc.html
[3]: http://www.opengroup.org/onlinepubs/009695399/functions/fputc.html
[4]: http://www.opengroup.org/onlinepubs/009695399/functions/fprintf.html
[5]: http://www.opengroup.org/onlinepubs/009695399/functions/fopen.html
[6]: http://www.opengroup.org/onlinepubs/009695399/functions/fclose.html
[7]: http://www.opengroup.org/onlinepubs/009695399/functions/feof.html
[8]: http://www.opengroup.org/onlinepubs/009695399/functions/fread.html
[9]: http://www.opengroup.org/onlinepubs/009695399/functions/fwrite.html

View File

@ -0,0 +1,289 @@
[#]: subject: (Learn how file input and output works in C)
[#]: via: (https://opensource.com/article/21/3/file-io-c)
[#]: author: (Jim Hall https://opensource.com/users/jim-hall)
[#]: collector: (lujun9972)
[#]: translator: (wyxplus)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
学习如何用 C 语言来进行文件输入输出操作
======
理解 I/O 有助于提升你的效率。
![4 manilla folders, yellow, green, purple, blue][1]
如果你打算学习用 C 语言进行输入输出,首先关注 `stdio.h` 包含的文件。你可能从其名字中猜到,该文件定义了所有的标准输入输出函数。
大多数人学习的第一个 `stdio.h` 的函数是 `printf` 函数,用于打印格式化输出。或者使用 `puts` 函数来打印一个字符串。这些函数非常有用,可以将信息打印给用户,但是如果你想做更多的事情,则需要了解其他函数。
你可以通过编写常见 Linux 命令的副本来了解其中一些功能和方法。`cp` 命令主要用于复制文件。如果你查看 `cp` 的帮助手册,可以看到 `cp` 命令支持非常多的参数和选项。但最简单的功能,就是复制文件:
```
`cp infile outfile`
```
你只需使用一些读写文件的基本函数,就可以用 C 语言中来自己实现 `cp` 命令。
### 同时读写一个字符
你可以使用 `fgetc``fputc` 函数轻松地进行输入输出。这些函数一次读写一个字符。该用法被定义在 `stdio.h`,并且这也很浅显易懂:`fgetc` 是从文件中读取一个字符,`fputc` 是将一个字符保存到文件中。
```
int [fgetc][2](FILE *stream);
int [fputc][3](int c, FILE *stream);
```
编写 `cp` 命令需要访问文件。在 C 语言中,你使用 `fopen` 函数打开一个文件,该函数带有两个参数:文件名和打开文件的方式。该方式通常是从文件读取 `r` 或向文件写入 `w`。打开文件的方式也有其他选项,但是对于本教程而言,仅关注于读写操作。
因此,将一个文件复制到另一个文件就变成了打开源文件和目标文件的问题,接着,不断从第一个文件读取字符,然后将该字符写入第二个文件。`fgetc` 函数返回从输入文件中读取的单个字符,或者返回文件完成后的(`EOF`)标记。一旦遇到 `EOF`,你就完成了复制操作,可以关闭两个文件。该代码如下所示:
```
do {
ch = [fgetc][2](infile);
if (ch != EOF) {
[fputc][3](ch, outfile);
}
} while (ch != EOF);
```
你可以使用此循环编写自己的`cp`程序,以使用`fgetc`和`fputc`函数一次读取和写入一个字符。`cp.c` 源代码如下所示:
```
#include <stdio.h>
int
main(int argc, char **argv)
{
FILE *infile;
FILE *outfile;
int ch;
/* parse the command line */
/* usage: cp infile outfile */
if (argc != 3) {
[fprintf][4](stderr, "Incorrect usage\n");
[fprintf][4](stderr, "Usage: cp infile outfile\n");
return 1;
}
/* open the input file */
infile = [fopen][5](argv[1], "r");
if (infile == NULL) {
[fprintf][4](stderr, "Cannot open file for reading: %s\n", argv[1]);
return 2;
}
/* open the output file */
outfile = [fopen][5](argv[2], "w");
if (outfile == NULL) {
[fprintf][4](stderr, "Cannot open file for writing: %s\n", argv[2]);
[fclose][6](infile);
return 3;
}
/* copy one file to the other */
/* use fgetc and fputc */
do {
ch = [fgetc][2](infile);
if (ch != EOF) {
[fputc][3](ch, outfile);
}
} while (ch != EOF);
/* done */
[fclose][6](infile);
[fclose][6](outfile);
return 0;
}
```
你可以使用 GCC 来将 `cp.c` 文件编译成一个可执行文件:
```
`$ gcc -Wall -o cp cp.c`
```
`-o cp` 选项告诉编译器将编译后的程序保存到 `cp` 文件中。` -Wall` 选项告诉编译器提示所有可能的警告,如果你没有看到任何警告,则表示一切正常。
### 读写数据块
通过每次读写一个字符来实现自己的 `cp` 命令可以完成这项工作,但这并不是很快。在复制“日常”文件(例如文档和文本文件)时,你可能不会注意到,但是在复制大型文件或通过网络复制文件时,你才会注意到差异。每次处理一个字符需要大量的开销。
实现此 `cp` 命令的一种更好的方法是,将输入的一部分读取到内存中(称为缓存),然后将该数据集合写入第二个文件。因为程序可以一次读取更多的数据,所以减少了文件读取次数,因此速度更快。
你可以使用 `fread` 函数将文件读入内存中。这个函数有几个参数:将数据读入的数组或内存缓冲区的指针(`ptr`),要读取的最小对象的大小(`size`),要读取对象的个数(`nmemb`),以及要从输入流(`stream`)读取的文件:
```
`size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);`
```
不同的选项为更高级的文件输入和输出(例如,读取和写入具有特定数据结构的文件)提供了很大的灵活性。但是,在从一个文件读取数据并将数据写入另一个文件的简单情况下,可以使用一个由字符数组组成的缓冲区。
你可以使用 `fwrite` 函数将缓冲区中数据写入到另一个文件。这使用了与 `fread` 函数有相似的一组选项:要从中读取数据的数组或内存缓冲区的指针,要读取的最小对象的大小,要读取对象的个数以及要写入的文件。
```
`size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);`
```
如果程序将文件读入缓冲区,然后将该缓冲区写入另一个文件,则数组(`ptr`)可以是固定大小的数组。例如,你可以使用长度为 200 个字符的字符数组作为缓冲区。
在该假设下,你需要更改 `cp` 程序中的循环,以将数据从文件读取到缓冲区中,然后将该缓冲区写入另一个文件中:
```
while (![feof][7](infile)) {
buffer_length = [fread][8](buffer, sizeof(char), 200, infile);
[fwrite][9](buffer, sizeof(char), buffer_length, outfile);
}
```
这是更新后的 `cp` 程序的完整源代码,该程序现在使用缓冲区读取和写入数据:
```
#include <stdio.h>
int
main(int argc, char **argv)
{
FILE *infile;
FILE *outfile;
char buffer[200];
size_t buffer_length;
/* parse the command line */
/* usage: cp infile outfile */
if (argc != 3) {
[fprintf][4](stderr, "Incorrect usage\n");
[fprintf][4](stderr, "Usage: cp infile outfile\n");
return 1;
}
/* open the input file */
infile = [fopen][5](argv[1], "r");
if (infile == NULL) {
[fprintf][4](stderr, "Cannot open file for reading: %s\n", argv[1]);
return 2;
}
/* open the output file */
outfile = [fopen][5](argv[2], "w");
if (outfile == NULL) {
[fprintf][4](stderr, "Cannot open file for writing: %s\n", argv[2]);
[fclose][6](infile);
return 3;
}
/* copy one file to the other */
/* use fread and fwrite */
while (![feof][7](infile)) {
buffer_length = [fread][8](buffer, sizeof(char), 200, infile);
[fwrite][9](buffer, sizeof(char), buffer_length, outfile);
}
/* done */
[fclose][6](infile);
[fclose][6](outfile);
return 0;
}
```
由于你想将此程序与其他程序进行比较,因此请将此源代码另存为 `cp2.c`。你可以使用 GCC 编译程序:
```
`$ gcc -Wall -o cp2 cp2.c`
```
和之前一样,`-o cp2` 选项告诉编译器将编译后的程序保存到 `cp2` 程序文件中。`-Wall` 选项告诉编译器打开所有警告。如果你没有看到任何警告,则表示一切正常。
### 是的,这真的更快了
使用缓冲区读取和写入数据是实现此版本 `cp` 程序更好的方法。由于它可以一次将文件的多个数据读取到内存中,因此该程序不需要频繁读取数据。在小文件中,你可能没有注意到使用这两种方案的区别,但是如果你需要复制大文件,或者在较慢的介质(例如通过网络连接)上复制数据时,会发现明显的差距。
我使用 Linux `time` 命令进行了比较。此命令运行另一个程序,然后告诉你该程序花费了多长时间。对于我的测试,我希望了解所花费时间的差距,因此我复制了系统上的 628 MB CD-ROM 映像文件。
我首先使用标准的 Linux 的 `cp` 命令复制了映像文件,以查看所需多长时间。一开始通过运行 Linux 的 `cp` 命令,同时我还避免使用 Linux 内置的文件缓存系统,使其不会给程序带来误导性能提升的可能性。使用 Linux `cp` 进行的测试,总计花费不到一秒钟的时间:
```
$ time cp FD13LIVE.iso tmpfile
real 0m0.040s
user 0m0.001s
sys 0m0.003s
```
运行我自己实现的 `cp` 命令版本,复制同一文件要花费更长的时间。每次读写一个字符则花了将近五秒钟来复制文件:
```
$ time ./cp FD13LIVE.iso tmpfile
real 0m4.823s
user 0m4.100s
sys 0m0.571s
```
从输入读取数据到缓冲区,然后将该缓冲区写入输出文件则要快得多。使用此方法复制文件花不到一秒钟:
```
$ time ./cp2 FD13LIVE.iso tmpfile
real 0m0.944s
user 0m0.224s
sys 0m0.608s
```
我演示的 `cp` 程序使用了 200 个字符大小的缓冲区。我确信如果一次将更多文件数据读入内存,该程序将运行得更快。但是,通过这种比较,即使只有 200 个字符的缓冲区,你也已经看到了性能上的巨大差异。
--------------------------------------------------------------------------------
via: https://opensource.com/article/21/3/file-io-c
作者:[Jim Hall][a]
选题:[lujun9972][b]
译者:[wyxplus](https://github.com/wyxplus)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/jim-hall
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/file_system.jpg?itok=pzCrX1Kc "4 manilla folders, yellow, green, purple, blue"
[2]: http://www.opengroup.org/onlinepubs/009695399/functions/fgetc.html
[3]: http://www.opengroup.org/onlinepubs/009695399/functions/fputc.html
[4]: http://www.opengroup.org/onlinepubs/009695399/functions/fprintf.html
[5]: http://www.opengroup.org/onlinepubs/009695399/functions/fopen.html
[6]: http://www.opengroup.org/onlinepubs/009695399/functions/fclose.html
[7]: http://www.opengroup.org/onlinepubs/009695399/functions/feof.html
[8]: http://www.opengroup.org/onlinepubs/009695399/functions/fread.html
[9]: http://www.opengroup.org/onlinepubs/009695399/functions/fwrite.html