mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-04 22:00:34 +08:00
91 lines
4.2 KiB
Markdown
91 lines
4.2 KiB
Markdown
Translating by Torival BASH drivers, start your engines
|
|
======
|
|
|
|
![](http://www.thelinuxrain.com/content/01-articles/201-bash-drivers-start-your-engines/headimage.jpg)
|
|
|
|
There's always more than one way to do a job in the shell, and there may not be One Best Way to do that job, either.
|
|
|
|
Nevertheless, different commands with the same output can differ in how long they take, how much memory they use and how hard they make the CPU work.
|
|
|
|
Out of curiosity I trialled 6 different ways to get the last 5 characters from each line of a text file, which is a simple text-processing task. The 6 commands are explained below and are abbreviated here as awk5, echo5, grep5, rev5, sed5 and tail5. These were also the names of the files generated by the commands.
|
|
|
|
### Tracking performance
|
|
|
|
I ran the trial on a 1.6GB UTF-8 text file with 1559391514 characters on 3570866 lines, or an average of 437 characters per line, and no blank lines. The last 5 characters on every line were alphanumeric.
|
|
|
|
To time the 6 commands I used **time** (the BASH shell built-in, not GNU **time** ) and while the commands were running I checked **top** to follow memory and CPU usage. My system is the Dell OptiPlex 9020 Micro described [here][1] and runs Debian 9.
|
|
|
|
All 6 commands used between 1 and 1.4GB of memory (VIRT in **top** ), and awk5, echo5, grep5 and sed5 ran at close to 100% CPU usage. Interestingly,
|
|
rev5 ran at ca 30% CPU and tail5 at ca 15%.
|
|
|
|
To ensure that all 6 commands had done the same job, I did a **diff** on the 6 output files, each about 21 MB:
|
|
|
|
![][2]
|
|
|
|
### And the winner is...
|
|
|
|
Here are the elapsed times:
|
|
|
|
![][3]
|
|
|
|
Well, AWK (GNU AWK 4.1.4) is really fast. Sure, all 6 commands could process a 100-line file zippety-quick, but for big text-processing jobs, fire up your AWK.
|
|
|
|
### Commands used
|
|
```
|
|
awk '{print substr($0,length($0)-4,5)}' file > awk5
|
|
```
|
|
|
|
awk5 used AWK's substring function. The function works on the whole line ($0), starts at the 4th character back from the last character (length($0)-4) and returns 5 characters (5).
|
|
```
|
|
#!/bin/bash
|
|
while read line; do echo "${line: -5}"; done < file > echo5
|
|
exit
|
|
```
|
|
|
|
echo5 was run as a script and uses a **while** loop for processing one line at a time. The BASH string function "${line: -5}" returns the last 5 characters in "$line".
|
|
```
|
|
grep -o '.....$' file > grep5
|
|
```
|
|
|
|
In grep5, **grep** searches each line for the last 5 characters (.....$) and returns (with the -o option) just that searched-for string.
|
|
```
|
|
#!/bin/bash
|
|
while read line; do rev <<<"$line" | cut -c1-5 | rev; done < file > rev5
|
|
exit
|
|
```
|
|
|
|
The rev5 trick in this script has appeared often in online forums. Each line is first reversed with **rev** , then **cut** is used to return the first 5 characters, then the 5-character string is reversed with **rev**.
|
|
```
|
|
sed 's/.*\(.....\)/\1/' file > sed5
|
|
```
|
|
|
|
sed5 is a simple use of **sed** (GNU sed 4.4) but was surprisingly slow in the trial. In each line, **sed** replaces zero or more characters leading up to the last 5 with just those last 5 (as a backreference).
|
|
```
|
|
#!/bin/bash
|
|
while read line; do tail -c 6 <<<"$line"; done < file > tail5
|
|
exit
|
|
```
|
|
|
|
The "-c 6" in the tail5 script means that **tail** captures the last 5 characters in each line plus the newline character at the end.
|
|
|
|
Actually, the "-c" option captures bytes, not characters, meaning if the line ends in multi-byte characters the output will be corrupt. But would you really want to use the ultra-slow **tail** for this job in the first place?
|
|
|
|
### About the Author
|
|
|
|
Bob Mesibov is Tasmanian, retired and a keen Linux tinkerer.
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
via: http://www.thelinuxrain.com/articles/bash-drivers-start-your-engines
|
|
|
|
作者:[Bob Mesibov][a]
|
|
译者:[译者ID](https://github.com/译者ID)
|
|
校对:[校对者ID](https://github.com/校对者ID)
|
|
|
|
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
|
|
|
[a]:http://www.thelinuxrain.com
|
|
[1]:http://www.thelinuxrain.com/articles/debian-9-on-a-dell-optiplex-9020-micro
|
|
[2]:http://www.thelinuxrain.com/content/01-articles/201-bash-drivers-start-your-engines/1.png
|
|
[3]:http://www.thelinuxrain.com/content/01-articles/201-bash-drivers-start-your-engines/2.png
|