translated

This commit is contained in:
geekpi 2019-09-24 08:52:10 +08:00
parent 2d0ef68777
commit 5ca483c774
2 changed files with 111 additions and 114 deletions

View File

@ -1,114 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How to remove carriage returns from text files on Linux)
[#]: via: (https://www.networkworld.com/article/3438857/how-to-remove-carriage-returns-from-text-files-on-linux.html)
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
How to remove carriage returns from text files on Linux
======
When carriage returns (also referred to as Ctrl+M's) get on your nerves, don't fret. There are several easy ways to show them the door.
[Kim Siever][1]
Carriage returns go back a long way as far back as typewriters on which a mechanism or a lever swung the carriage that held a sheet of paper to the right so that suddenly letters were being typed on the left again. They have persevered in text files on Windows, but were never used on Linux systems. This incompatibility sometimes causes problems when youre trying to process files on Linux that were created on Windows, but it's an issue that is very easily resolved.
The carriage return, also referred to as **Ctrl+M**, character would show up as an octal 15 if you were looking at the file with an **od** octal dump) command. The characters **CRLF** are often used to represent the carriage return and linefeed sequence that ends lines on Windows text files. Those who like to gaze at octal dumps will spot the **\r \n**. Linux text files, by comparison, end with just linefeeds.
**[ Two-Minute Linux Tips: [Learn how to master a host of Linux commands in these 2-minute video tutorials][2] ]**
Here's a sample of **od** output with the lines containing the **CRLF** characters in both octal and character form highlighted.
```
$ od -bc testfile.txt
0000000 124 150 151 163 040 151 163 040 141 040 164 145 163 164 040 146
T h i s i s a t e s t f
0000020 151 154 145 040 146 162 157 155 040 127 151 156 144 157 167 163
i l e f r o m W i n d o w s
0000040 056 015 012 111 164 047 163 040 144 151 146 146 145 162 145 156 <==
. \r \n I t ' s d i f f e r e n <==
0000060 164 040 164 150 141 156 040 141 040 125 156 151 170 040 164 145
t t h a n a U n i x t e
0000100 170 164 040 146 151 154 145 015 012 167 157 165 154 144 040 142 <==
x t f i l e \r \n w o u l d b <==
```
While these characters dont represent a huge problem, they can sometimes interfere when you want to parse the text files in some way and dont want to have to code around their presence or absence.
### 3 ways to remove carriage return characters from text files
Fortunately, there are several ways to easily remove carriage return characters. Here are three options:
#### dos2unix
You might need to go through the trouble of installing it, but **dos2unix** is probably the easiest way to turn Windows text files into Unix/Linux text files. One command with one argument, and youre done. No second file name is required. The file will be changed in place.
```
$ dos2unix testfile.txt
dos2unix: converting file testfile.txt to Unix format...
```
You should see the file length decrease, depending on how many lines it contains. A file with 100 lines would likely shrink by 99 characters, since only the last line will not end with the **CRLF** characters.
Before:
```
-rw-rw-r-- 1 shs shs 121 Sep 14 19:11 testfile.txt
```
After:
```
-rw-rw-r-- 1 shs shs 118 Sep 14 19:12 testfile.txt
```
If you need to convert a large collection of files, don't fix them one at a time. Instead, put them all in a directory by themselves and run a command like this:
```
$ find . -type f -exec dos2unix {} \;
```
In this command, we use find to locate regular files and then run the **dos2unix** command to convert them one at a time. The {} in the command is replaced by the filename. You should be sitting in the directory with the files when you run it. This command could damage other types of files, such as those that contain octal 15 characters in some context other than a text file (e.g., bytes in an image file).
#### sed
You can also use **sed**, the stream editor, to remove carriage returns. You will, however, have to supply a second file name. Heres an example:
```
$ sed -e “s/^M//” before.txt > after.txt
```
One important thing to note is that you DONT type what that command appears to be. You must enter **^M** by typing **Ctrl+V** followed by **Ctrl+M**. The “s” is the substitute command. The slashes separate the text were looking for (the Ctrl+M) and the text (nothing in this case) that were replacing it with.
#### vi
You can even remove carriage return (**Ctrl+M**) characters with **vi**, although this assumes youre not running through hundreds of files and are maybe making some other changes, as well. You would type “**:**” to go to the command line and then type the string shown below. As with **sed**, the **^M** portion of this command requires typing **Ctrl+V** to get the **^** and then **Ctrl+M** to insert the **M**. The **%s** is a substitute operation, the slashes again separate the characters we want to remove and the text (nothing) we want to replace it with. The “**g**” (global) means to do this on every line in the file.
```
:%s/^M//g
```
#### Wrap-up
The **dos2unix** command is probably the easiest to remember and most reliable way to remove carriage returns from text files. Other options are a little trickier to use, but they provide the same basic function.
Join the Network World communities on [Facebook][3] and [LinkedIn][4] to comment on topics that are top of mind.
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3438857/how-to-remove-carriage-returns-from-text-files-on-linux.html
作者:[Sandra Henry-Stocker][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Sandra-Henry_Stocker/
[b]: https://github.com/lujun9972
[1]: https://www.flickr.com/photos/kmsiever/5895380540/in/photolist-9YXnf5-cNmpxq-2KEvib-rfecPZ-9snnkJ-2KAcDR-dTxzKW-6WdgaG-6H5i46-2KzTZX-7cnSw7-e3bUdi-a9meh9-Zm3pD-xiFhs-9Hz6YM-ar4DEx-4PXAhw-9wR4jC-cihLcs-asRFJc-9ueXvG-aoWwHq-atwL3T-ai89xS-dgnntH-5en8Te-dMUDd9-aSQVn-dyZqij-cg4SeS-abygkg-f2umXt-Xk129E-4YAeNn-abB6Hb-9313Wk-f9Tot-92Yfva-2KA7Sv-awSCtG-2KDPzb-eoPN6w-FE9oi-5VhaNf-eoQgx7-eoQogA-9ZWoYU-7dTGdG-5B1aSS
[2]: https://www.youtube.com/playlist?list=PL7D2RMSmRO9J8OTpjFECi8DJiTQdd4hua
[3]: https://www.facebook.com/NetworkWorld/
[4]: https://www.linkedin.com/company/network-world

View File

@ -0,0 +1,111 @@
[#]: collector: (lujun9972)
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How to remove carriage returns from text files on Linux)
[#]: via: (https://www.networkworld.com/article/3438857/how-to-remove-carriage-returns-from-text-files-on-linux.html)
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
何在 Linux 中删除文本中的回车
======
当回车(也称为 Ctrl+M让你紧张时别担心。有几种简单的方法消除它们。
[Kim Siever][1]
回车可以往回追溯很长一段时间 - 早在打字机上就有一个机械装置或杠杆将承载纸滚筒的机架移到最后边,以便重新在左侧输入字母。他们在 Windows 的文本上保留了它,但从未在 Linux 系统上使用过。当你尝试在 Linux 上处理在 Windows 上创建的文件时,这种不兼容性有时会导致问题,但这是一个非常容易解决的问题。
如果你使用 **od**octal dump命令查看文件那么回车也称为 **Ctrl+M**)字符将显示为八进制的 15。字符 **CRLF** 通常用于表示在 Windows 文本上结束行的回车符和换行符序列。那些注意看八进制转储的会看到 **\r \n**。相比之下Linux 文本仅以换行符结束。
这有一个 **od** 输出的示例,高亮显示了行中的 **CRLF** 字符,以及它的八进制。
```
$ od -bc testfile.txt
0000000 124 150 151 163 040 151 163 040 141 040 164 145 163 164 040 146
T h i s i s a t e s t f
0000020 151 154 145 040 146 162 157 155 040 127 151 156 144 157 167 163
i l e f r o m W i n d o w s
0000040 056 015 012 111 164 047 163 040 144 151 146 146 145 162 145 156 <==
. \r \n I t ' s d i f f e r e n <==
0000060 164 040 164 150 141 156 040 141 040 125 156 151 170 040 164 145
t t h a n a U n i x t e
0000100 170 164 040 146 151 154 145 015 012 167 157 165 154 144 040 142 <==
x t f i l e \r \n w o u l d b <==
```
虽然这些字符不是大问题,但是当你想要以某种方式解析文本,并且不希望就它们是否存在进行编码时,这有时候会产生干扰。
### 3 种从文本中删除回车符的方法
幸运的是,有几种方法可以轻松删除回车符。这有三个选择:
#### dos2unix
你可能会在安装上遇到麻烦,但 **dos2unix** 可能是将 Windows 文本转换为 Unix/Linux 文本的最简单方法。一个命令带上一个参数就行了。不需要第二个文件名。该文件会被直接更改。
```
$ dos2unix testfile.txt
dos2unix: converting file testfile.txt to Unix format...
```
你应该看到文件长度减少,具体取决于它包含的行数。包含 100 行的文件可能会缩小 99 个字符,因为只有最后一行不会以 **CRLF** 字符结尾。
之前:
```
-rw-rw-r-- 1 shs shs 121 Sep 14 19:11 testfile.txt
```
之后:
```
-rw-rw-r-- 1 shs shs 118 Sep 14 19:12 testfile.txt
```
如果你需要转换大量文件,不用每次修复一个。相反,将它们全部放在一个目录中并运行如下命令:
```
$ find . -type f -exec dos2unix {} \;
```
在此命令中,我们使用 find 查找常规文件,然后运行 **dos2unix** 命令一次转换一个。命令中的 {} 将被替换为文件名。运行时,你应该处于包含文件的目录中。此命令可能会损坏其他类型的文件,例如除了文本文件外在上下文中包含八进制 15 的文件(如,镜像文件中的字节)。
#### sed
你还可以使用流编辑器 **sed** 来删除回车符。但是,你必须提供第二个文件名。以下是例子:
```
$ sed -e “s/^M//” before.txt > after.txt
```
一件需要注意的重要的事情是,请不要输入你看到的字符。你必须按下 **Ctrl+V** 后跟 **Ctrl+M** 来输入 **^M**。 “s” 是替换命令。斜杠将我们要查找的文本Ctrl + M和要替换的文本这里是空分开。
#### vi
你甚至可以使用 **vi** 删除回车符(**Ctrl+M**),但这里假设你没有打开数百个文件,或许也在做一些其他的修改。你可以键入“**:**” 进入命令行,然后输入下面的字符串。与 **sed** 一样,命令中 **^M** 需要通过 **Ctrl+V** 输入 **^**,然后 **Ctrl+M** 插入**M**。 **%s**是替换操作,斜杠再次将我们要删除的字符和我们想要替换它的文本(空)分开。 “**g**”(全局)意味在所有行上执行。
```
:%s/^M//g
```
#### 总结
**dos2unix** 命令可能是最容易记住的,也是最可靠地从文本中删除回车的方法。 其他选择使用起来有点困难,但它们提供相同的基本功能。
在 [Facebook][3] 和 [LinkedIn][4] 上加入 Network World 社区,评论最热主题。
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3438857/how-to-remove-carriage-returns-from-text-files-on-linux.html
作者:[Sandra Henry-Stocker][a]
选题:[lujun9972][b]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Sandra-Henry_Stocker/
[b]: https://github.com/lujun9972
[1]: https://www.flickr.com/photos/kmsiever/5895380540/in/photolist-9YXnf5-cNmpxq-2KEvib-rfecPZ-9snnkJ-2KAcDR-dTxzKW-6WdgaG-6H5i46-2KzTZX-7cnSw7-e3bUdi-a9meh9-Zm3pD-xiFhs-9Hz6YM-ar4DEx-4PXAhw-9wR4jC-cihLcs-asRFJc-9ueXvG-aoWwHq-atwL3T-ai89xS-dgnntH-5en8Te-dMUDd9-aSQVn-dyZqij-cg4SeS-abygkg-f2umXt-Xk129E-4YAeNn-abB6Hb-9313Wk-f9Tot-92Yfva-2KA7Sv-awSCtG-2KDPzb-eoPN6w-FE9oi-5VhaNf-eoQgx7-eoQogA-9ZWoYU-7dTGdG-5B1aSS
[3]: https://www.facebook.com/NetworkWorld/
[4]: https://www.linkedin.com/company/network-world