mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-26 21:30:55 +08:00
translated 20191024 Get sorted with sort at the command line
This commit is contained in:
parent
debc96936b
commit
bffb9b6a72
@ -1,250 +0,0 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (lxbwolf)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Get sorted with sort at the command line)
|
||||
[#]: via: (https://opensource.com/article/19/10/get-sorted-sort)
|
||||
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
|
||||
|
||||
Get sorted with sort at the command line
|
||||
======
|
||||
Reorganize your data in a format that makes sense to you—right from the
|
||||
Linux, BSD, or Mac terminal—with the sort command.
|
||||
![Coding on a computer][1]
|
||||
|
||||
If you've ever used a spreadsheet application, then you know that rows can be sorted by the contents of a column. For instance, if you have a list of expenses, you might want to sort them by date or by ascending price or by category, and so on. If you're comfortable using a terminal, you may not want to have to use a big office application just to sort text data. And that's exactly what the [**sort**][2] command is for.
|
||||
|
||||
### Installing
|
||||
|
||||
You don't need to install **sort** because it's invariably included on any [POSIX][3] system. On most Linux systems, the **sort** command is bundled in a collection of utilities from the GNU organization. On other POSIX systems, such as BSD and Mac, the default **sort** command is not from GNU, so some options may differ. I'll attempt to account for both GNU and BSD implementations in this article.
|
||||
|
||||
### Sort lines alphabetically
|
||||
|
||||
The **sort** command, by default, looks at the first character of each line of a file and outputs each line in ascending alphabetic order. In the event that two characters on multiple lines are the same, it considers the next character. For example:
|
||||
|
||||
|
||||
```
|
||||
$ cat distro.list
|
||||
Slackware
|
||||
Fedora
|
||||
Red Hat Enterprise Linux
|
||||
Ubuntu
|
||||
Arch
|
||||
1337
|
||||
Mint
|
||||
Mageia
|
||||
Debian
|
||||
$ sort distro.list
|
||||
1337
|
||||
Arch
|
||||
Debian
|
||||
Fedora
|
||||
Mageia
|
||||
Mint
|
||||
Red Hat Enterprise Linux
|
||||
Slackware
|
||||
Ubuntu
|
||||
```
|
||||
|
||||
Using **sort** doesn't change the original file. Sort is a filter, so if you want to preserve your data in its sorted form, you must redirect the output using either **>** or **tee**:
|
||||
|
||||
|
||||
```
|
||||
$ sort distro.list | tee distro.sorted
|
||||
1337
|
||||
Arch
|
||||
Debian
|
||||
[...]
|
||||
$ cat distro.sorted
|
||||
1337
|
||||
Arch
|
||||
Debian
|
||||
[...]
|
||||
```
|
||||
|
||||
### Sort by column
|
||||
|
||||
Complex data sets sometimes need to be sorted by something other than the first letter of each line. Imagine, for instance, a list of animals and each one's species and genus, and each "field" (a "cell" in a spreadsheet) is defined by a predictable delimiter character. This is such a common data format for spreadsheet exports that the CSV (comma-separated values) file extension exists to identify such files (although a CSV file doesn't have to be comma-separated, nor does a delimited file have to use the CSV extension to be valid and usable). Consider this example data set:
|
||||
|
||||
|
||||
```
|
||||
Aptenodytes;forsteri;Miller,JF;1778;Emperor
|
||||
Pygoscelis;papua;Wagler;1832;Gentoo
|
||||
Eudyptula;minor;Bonaparte;1867;Little Blue
|
||||
Spheniscus;demersus;Brisson;1760;African
|
||||
Megadyptes;antipodes;Milne-Edwards;1880;Yellow-eyed
|
||||
Eudyptes;chrysocome;Viellot;1816;Southern Rockhopper
|
||||
Torvaldis;linux;Ewing,L;1996;Tux
|
||||
```
|
||||
|
||||
Given this sample data set, you can use the **\--field-separator** (use **-t** on BSD and Mac—or on GNU to reduce typing) option to set the delimiting character to a semicolon (because this example uses semicolons instead of commas, but it could use any character), and use the **\--key** (**-k** on BSD and Mac or on GNU to reduce typing) option to define which field to sort by. For example, to sort by the second field (starting at 1, not 0) of each line:
|
||||
|
||||
|
||||
```
|
||||
sort --field-separator=";" --key=2
|
||||
Megadyptes;antipodes;Milne-Edwards;1880;Yellow-eyed
|
||||
Eudyptes;chrysocome;Viellot;1816;Sothern Rockhopper
|
||||
Spheniscus;demersus;Brisson;1760;African
|
||||
Aptenodytes;forsteri;Miller,JF;1778;Emperor
|
||||
Torvaldis;linux;Ewing,L;1996;Tux
|
||||
Eudyptula;minor;Bonaparte;1867;Little Blue
|
||||
Pygoscelis;papua;Wagler;1832;Gentoo
|
||||
```
|
||||
|
||||
That's somewhat difficult to read, but Unix is famous for its _pipe_ method of constructing commands, so you can use the **column** command to "prettify" the output. Using GNU **column**:
|
||||
|
||||
|
||||
```
|
||||
$ sort --field-separator=";" \
|
||||
\--key=2 penguins.list | \
|
||||
column --table --separator ";"
|
||||
Megadyptes antipodes Milne-Edwards 1880 Yellow-eyed
|
||||
Eudyptes chrysocome Viellot 1816 Southern Rockhopper
|
||||
Spheniscus demersus Brisson 1760 African
|
||||
Aptenodytes forsteri Miller,JF 1778 Emperor
|
||||
Torvaldis linux Ewing,L 1996 Tux
|
||||
Eudyptula minor Bonaparte 1867 Little Blue
|
||||
Pygoscelis papua Wagler 1832 Gentoo
|
||||
```
|
||||
|
||||
Slightly more cryptic to the new user (but shorter to type), the command options on BSD and Mac:
|
||||
|
||||
|
||||
```
|
||||
$ sort -t ";" \
|
||||
-k2 penguins.list | column -t -s ";"
|
||||
Megadyptes antipodes Milne-Edwards 1880 Yellow-eyed
|
||||
Eudyptes chrysocome Viellot 1816 Southern Rockhopper
|
||||
Spheniscus demersus Brisson 1760 African
|
||||
Aptenodytes forsteri Miller,JF 1778 Emperor
|
||||
Torvaldis linux Ewing,L 1996 Tux
|
||||
Eudyptula minor Bonaparte 1867 Little Blue
|
||||
Pygoscelis papua Wagler 1832 Gentoo
|
||||
```
|
||||
|
||||
The **key** definition doesn't have to be set to **2**, of course. Any existing field may be used as the sorting key.
|
||||
|
||||
### Reverse sort
|
||||
|
||||
You can reverse the order of a sorted list with the **\--reverse** (**-r** on BSD or Mac or GNU for brevity):
|
||||
|
||||
|
||||
```
|
||||
$ sort --reverse alphabet.list
|
||||
z
|
||||
y
|
||||
x
|
||||
w
|
||||
[...]
|
||||
```
|
||||
|
||||
You can achieve the same result by piping the output of a normal sort through [tac][4].
|
||||
|
||||
### Sorting by month (GNU only)
|
||||
|
||||
In a perfect world, everyone would write dates according to the ISO 8601 standard: year, month, day. It's a logical method of specifying a unique date, and it's easy for computers to understand. And yet quite often, humans use other means of identifying dates, including months with pretty arbitrary names.
|
||||
|
||||
Fortunately, the GNU **sort** command accounts for this and is able to sort correctly by month name. Use the **\--month-sort** (**-M**) option:
|
||||
|
||||
|
||||
```
|
||||
$ cat month.list
|
||||
November
|
||||
October
|
||||
September
|
||||
April
|
||||
[...]
|
||||
$ sort --month-sort month.list
|
||||
January
|
||||
February
|
||||
March
|
||||
April
|
||||
May
|
||||
[...]
|
||||
November
|
||||
December
|
||||
```
|
||||
|
||||
Months may be identified by their full name or some portion of their names.
|
||||
|
||||
### Human-readable numeric sort (GNU only)
|
||||
|
||||
Another common point of confusion between humans and computers is groups of numbers. For instance, humans often write "1024 kilobytes" as "1KB" because it's easier and quicker for the human brain to parse "1KB" than "1024" (and it gets easier the larger the number becomes). To a computer, though, a string such as 9KB is larger than, for instance, 1MB (even though 9KB is only a fraction of a megabyte). The GNU **sort** command provides the **\--human-numeric-sort** (**-h**) option to help parse these values correctly.
|
||||
|
||||
|
||||
```
|
||||
$ cat sizes.list
|
||||
2M
|
||||
12MB
|
||||
1k
|
||||
9k
|
||||
900
|
||||
7000
|
||||
$ sort --human-numeric-sort
|
||||
900
|
||||
7000
|
||||
1k
|
||||
9k
|
||||
2M
|
||||
12MB
|
||||
```
|
||||
|
||||
There are some inconsistencies. For example, 16,000 bytes is greater than 1KB, but **sort** fails to recognize that:
|
||||
|
||||
|
||||
```
|
||||
$ cat sizes0.list
|
||||
2M
|
||||
12MB
|
||||
16000
|
||||
1k
|
||||
$ sort -h sizes0.list
|
||||
16000
|
||||
1k
|
||||
2M
|
||||
12MB
|
||||
```
|
||||
|
||||
Logically, 16,000 should be written 16KB in this context, so GNU **sort** is not entirely to blame. As long as you are sure that your numbers are consistent, the **\--human-numeric-sort** can help parse human-readable numbers in a computer-friendly way.
|
||||
|
||||
### Randomized sort (GNU only)
|
||||
|
||||
Sometimes utilities provide the option to do the opposite of what they're meant to do. In a way, it makes no sense for a **sort** command to have the ability to "sort" a file randomly. Then again, the workflow of the command makes it a convenient feature to have. You _could_ use a different command, like [**shuf**][5], or you could just add an option to the command you're using. Whether it's bloat or ingenious UX design, the GNU **sort** command provides the means to sort a file arbitrarily.
|
||||
|
||||
The purest form of arbitrary sorting is the **\--random-sort** or **-R** option (not to be confused with the **-r** option, which is short for **\--reverse**).
|
||||
|
||||
|
||||
```
|
||||
$ sort --random-sort alphabet.list
|
||||
d
|
||||
m
|
||||
p
|
||||
a
|
||||
[...]
|
||||
```
|
||||
|
||||
You can run a random sort multiple times on a file for different results each time.
|
||||
|
||||
### Sorted
|
||||
|
||||
There are many more features available with the **sort** GNU and BSD commands, so spend some time getting to know the options. You'll be surprised at how flexible **sort** can be, especially when it's combined with other Unix utilities.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/19/10/get-sorted-sort
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/code_computer_laptop_hack_work.png?itok=aSpcWkcl (Coding on a computer)
|
||||
[2]: https://en.wikipedia.org/wiki/Sort_(Unix)
|
||||
[3]: https://en.wikipedia.org/wiki/POSIX
|
||||
[4]: https://opensource.com/article/19/9/tac-command
|
||||
[5]: https://www.gnu.org/software/coreutils/manual/html_node/shuf-invocation.html
|
@ -0,0 +1,249 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (lxbwolf)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Get sorted with sort at the command line)
|
||||
[#]: via: (https://opensource.com/article/19/10/get-sorted-sort)
|
||||
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
|
||||
|
||||
命令行用 sort 进行排序
|
||||
======
|
||||
按自己的需求重新整理数据 — 用 Linux,BSD 或 Mac 的 terminal — 使用 sort 命令。
|
||||
![Coding on a computer][1]
|
||||
|
||||
如果你曾经用过数据表应用程序,你就会知道可以按列的内容对行进行排序。例如,如果你有一列价格,你可能希望对它们进行按日期或升序抑或按类别进行排序。如果你熟悉 terminal 的使用,你不会仅为了排序文本数据就去使用庞大的办公软件。这正是 [**sort**][2] 命令的用处。
|
||||
|
||||
### 安装
|
||||
|
||||
你不必安装 **sort** ,因为它包含在任意 [POSIX][3] 系统里。在大多数 Linux 系统中,**sort** 命令被 GNU 组织捆绑在实用工具集合中。在其他的 POSIX 系统中,像 BSD 和 Mac,默认的 **sort** 命令不是 GNU 提供的,所以有一些选项可能不一样。本文中我尽量对 GNU 和 BSD 两者的实现都进行说明。
|
||||
|
||||
### 按字母顺序排列行
|
||||
|
||||
**sort** 命令默认会读取文件每行的第一个字符并对每行按字母升序排序后输出。两行中的第一个字符相同的情况下,对下一个字符进行对比。例如:
|
||||
|
||||
|
||||
```
|
||||
$ cat distro.list
|
||||
Slackware
|
||||
Fedora
|
||||
Red Hat Enterprise Linux
|
||||
Ubuntu
|
||||
Arch
|
||||
1337
|
||||
Mint
|
||||
Mageia
|
||||
Debian
|
||||
$ sort distro.list
|
||||
1337
|
||||
Arch
|
||||
Debian
|
||||
Fedora
|
||||
Mageia
|
||||
Mint
|
||||
Red Hat Enterprise Linux
|
||||
Slackware
|
||||
Ubuntu
|
||||
```
|
||||
|
||||
使用 **sort** 不会改变原文件。sort 仅起到过滤的作用,所以如果你希望按排序后的格式保存数据,你需要用 **>** 或 **tee** 进行重定向。
|
||||
|
||||
|
||||
```
|
||||
$ sort distro.list | tee distro.sorted
|
||||
1337
|
||||
Arch
|
||||
Debian
|
||||
[...]
|
||||
$ cat distro.sorted
|
||||
1337
|
||||
Arch
|
||||
Debian
|
||||
[...]
|
||||
```
|
||||
|
||||
### 按列排序
|
||||
|
||||
复杂的数据有时候不止需要对每行的第一个字符进行排序。例如,假设有一个动物列表,用可预见的分隔符分隔每一个「字段」(数据表中的「单元格」)。这类由数据表导出的格式很常见,CSV(comma-separated values,以逗号分隔的数据)后缀可以标识这些文件(虽然 CSV 文件不一定用逗号分隔,有分隔符的文件也不一定用 CSV 后缀)。以下数据作为示例:
|
||||
|
||||
|
||||
```
|
||||
Aptenodytes;forsteri;Miller,JF;1778;Emperor
|
||||
Pygoscelis;papua;Wagler;1832;Gentoo
|
||||
Eudyptula;minor;Bonaparte;1867;Little Blue
|
||||
Spheniscus;demersus;Brisson;1760;African
|
||||
Megadyptes;antipodes;Milne-Edwards;1880;Yellow-eyed
|
||||
Eudyptes;chrysocome;Viellot;1816;Southern Rockhopper
|
||||
Torvaldis;linux;Ewing,L;1996;Tux
|
||||
```
|
||||
|
||||
对于这组示例数据,你可以用 **--field-separator** (在 BSD 和 Mac 用 **-t**,或 GNU 上可以用简写 **-t** )设置分隔符为分号(以为示例数据中是用分号而不是逗号,理论上分隔符可以是任意字符),用 **--key**( 在 BSD 和 Mac 上用 **-k**,或 GNU 上可以用简写 **-k**)选项指定哪个字段被排序。例如,对每行第二个字段进行排序(以 1 开头而不是 0):
|
||||
|
||||
|
||||
```
|
||||
sort --field-separator=";" --key=2
|
||||
Megadyptes;antipodes;Milne-Edwards;1880;Yellow-eyed
|
||||
Eudyptes;chrysocome;Viellot;1816;Sothern Rockhopper
|
||||
Spheniscus;demersus;Brisson;1760;African
|
||||
Aptenodytes;forsteri;Miller,JF;1778;Emperor
|
||||
Torvaldis;linux;Ewing,L;1996;Tux
|
||||
Eudyptula;minor;Bonaparte;1867;Little Blue
|
||||
Pygoscelis;papua;Wagler;1832;Gentoo
|
||||
```
|
||||
|
||||
结果有点不容易读,但是 Unix 以构造命令的 **pipe** 方法而闻名,所以你可以使用 **column** 命令美化输出结果。使用 GNU **column**:
|
||||
|
||||
|
||||
```
|
||||
$ sort --field-separator=";" \
|
||||
\--key=2 penguins.list | \
|
||||
column --table --separator ";"
|
||||
Megadyptes antipodes Milne-Edwards 1880 Yellow-eyed
|
||||
Eudyptes chrysocome Viellot 1816 Southern Rockhopper
|
||||
Spheniscus demersus Brisson 1760 African
|
||||
Aptenodytes forsteri Miller,JF 1778 Emperor
|
||||
Torvaldis linux Ewing,L 1996 Tux
|
||||
Eudyptula minor Bonaparte 1867 Little Blue
|
||||
Pygoscelis papua Wagler 1832 Gentoo
|
||||
```
|
||||
|
||||
对于初学者可能有点不好理解(但是写起来简单),BSD 和 Mac 上的命令选项:
|
||||
|
||||
|
||||
```
|
||||
$ sort -t ";" \
|
||||
-k2 penguins.list | column -t -s ";"
|
||||
Megadyptes antipodes Milne-Edwards 1880 Yellow-eyed
|
||||
Eudyptes chrysocome Viellot 1816 Southern Rockhopper
|
||||
Spheniscus demersus Brisson 1760 African
|
||||
Aptenodytes forsteri Miller,JF 1778 Emperor
|
||||
Torvaldis linux Ewing,L 1996 Tux
|
||||
Eudyptula minor Bonaparte 1867 Little Blue
|
||||
Pygoscelis papua Wagler 1832 Gentoo
|
||||
```
|
||||
|
||||
当然 **key** 不一定非要设为 **2**。任意存在的字段都可以被设为排序的 key。
|
||||
|
||||
### 逆序排列
|
||||
|
||||
你可以用 **--reverse**(BSD/Mac 上用 **-r**, GNU 也可以用简写 **-r**)选项来颠倒已经排好序的列表。
|
||||
|
||||
|
||||
```
|
||||
$ sort --reverse alphabet.list
|
||||
z
|
||||
y
|
||||
x
|
||||
w
|
||||
[...]
|
||||
```
|
||||
|
||||
你也可以把输出结果通过管道传给命令 [tac][4] 来实现相同的效果。
|
||||
|
||||
### 按月排序 (仅 GNU 支持)
|
||||
|
||||
理想情况下,所有人都按照 ISO 8601 标准来写日期:年,月,日。这是一种合乎逻辑的指定精确日期的方法,也可以很容易地被计算机理解。也有很多情况下,人类用其他的方式标注日期,用很随意的名字表示月份。
|
||||
|
||||
幸运的是,GNU **sort** 命令能识别这种写法,并可以按月份的名称正确排序。使用 **--month-sort (-M)** 选项:
|
||||
|
||||
|
||||
```
|
||||
$ cat month.list
|
||||
November
|
||||
October
|
||||
September
|
||||
April
|
||||
[...]
|
||||
$ sort --month-sort month.list
|
||||
January
|
||||
February
|
||||
March
|
||||
April
|
||||
May
|
||||
[...]
|
||||
November
|
||||
December
|
||||
```
|
||||
|
||||
月份的全称和简写都可以被识别。
|
||||
|
||||
### 人类可读的数字排序(仅 GNU 支持)
|
||||
|
||||
另一个广泛的人类和计算机的混淆点是数字的组合。例如,人类通常把 ”1024 kilobytes“ 写成 “1KB”,因为人类解析 ”1 KB“ 比 ”1024“ 要容易且更快(数字越大,这种差异越明显)。对于计算机来说,一个 9 KB 的字符串要比诸如 1 MB 的字符串大(尽管 9 KB 是 1 兆的很小一部分)。GNU **sort** 命令提供了**--human-numeric-sort (-h)** 选项来帮助正确解析这些值。
|
||||
|
||||
|
||||
```
|
||||
$ cat sizes.list
|
||||
2M
|
||||
12MB
|
||||
1k
|
||||
9k
|
||||
900
|
||||
7000
|
||||
$ sort --human-numeric-sort
|
||||
900
|
||||
7000
|
||||
1k
|
||||
9k
|
||||
2M
|
||||
12MB
|
||||
```
|
||||
|
||||
有一些情况例外。例如,16000 bytes 比 1 KB 大,但是 **sort** 识别不了。
|
||||
|
||||
|
||||
```
|
||||
$ cat sizes0.list
|
||||
2M
|
||||
12MB
|
||||
16000
|
||||
1k
|
||||
$ sort -h sizes0.list
|
||||
16000
|
||||
1k
|
||||
2M
|
||||
12MB
|
||||
```
|
||||
|
||||
逻辑上来说,这个示例中16000 应该写成 16 KB,所以也不应该全部归咎于GNU **sort** 。只要你确保数字的一致性,**--human-numeric-sort** 可以用一种计算机友好的方式解析成人类可读的数字。
|
||||
|
||||
### 随机排序(仅 GNU 支持)
|
||||
|
||||
有时候工具也提供了一些与设计初衷相悖的选项。某种程度上说,**sort** 命令提供了对一个文件进行随机排序的能力没有任何意义。这个命令的工作流让这个特性变得很方便。你可以用其他的命令,像 [**shuf**][5] ,或者你可以用现在的命令添加一个选项。不管你认为它是一个臃肿的还是极具创造力的 UX 设计,GNU **sort** 命令提供了对文件进行随机排序的功能。
|
||||
|
||||
最纯粹的随机排序格式选项是 **--random-sort** 或 **-R**(不要跟 **-r** 混淆,**-r** 是 **--reverse** 的简写)。
|
||||
|
||||
|
||||
```
|
||||
$ sort --random-sort alphabet.list
|
||||
d
|
||||
m
|
||||
p
|
||||
a
|
||||
[...]
|
||||
```
|
||||
|
||||
每次对文件运行随机排序都会有不同的结果。
|
||||
|
||||
### 结语
|
||||
|
||||
GNU 和 BSD 命令 **sort** 还有很多功能,所以花点时间去了解这些选项。你会惊异于 **sort** 的灵活性,尤其是当它和其他的 Unix 工具一起使用时。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/19/10/get-sorted-sort
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[lxbwolf](https://github.com/lxbwolf)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/code_computer_laptop_hack_work.png?itok=aSpcWkcl "Coding on a computer"
|
||||
[2]: https://en.wikipedia.org/wiki/Sort_(Unix)
|
||||
[3]: https://en.wikipedia.org/wiki/POSIX
|
||||
[4]: https://opensource.com/article/19/9/tac-command
|
||||
[5]: https://www.gnu.org/software/coreutils/manual/html_node/shuf-invocation.html
|
Loading…
Reference in New Issue
Block a user