mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-03-21 02:10:11 +08:00
Merge pull request #7134 from lujun9972/add-MjAxODAxMTEgQkFTSCBkcml2ZXJzLCBzdGFydCB5b3VyIGVuZ2luZXMubWQK
选题: BASH drivers, start your engines
This commit is contained in:
commit
d0fe4ced62
90
sources/tech/20180111 BASH drivers, start your engines.md
Normal file
90
sources/tech/20180111 BASH drivers, start your engines.md
Normal file
@ -0,0 +1,90 @@
|
||||
BASH drivers, start your engines
|
||||
======
|
||||
|
||||

|
||||
|
||||
There's always more than one way to do a job in the shell, and there may not be One Best Way to do that job, either.
|
||||
|
||||
Nevertheless, different commands with the same output can differ in how long they take, how much memory they use and how hard they make the CPU work.
|
||||
|
||||
Out of curiosity I trialled 6 different ways to get the last 5 characters from each line of a text file, which is a simple text-processing task. The 6 commands are explained below and are abbreviated here as awk5, echo5, grep5, rev5, sed5 and tail5. These were also the names of the files generated by the commands.
|
||||
|
||||
### Tracking performance
|
||||
|
||||
I ran the trial on a 1.6GB UTF-8 text file with 1559391514 characters on 3570866 lines, or an average of 437 characters per line, and no blank lines. The last 5 characters on every line were alphanumeric.
|
||||
|
||||
To time the 6 commands I used **time** (the BASH shell built-in, not GNU **time** ) and while the commands were running I checked **top** to follow memory and CPU usage. My system is the Dell OptiPlex 9020 Micro described [here][1] and runs Debian 9.
|
||||
|
||||
All 6 commands used between 1 and 1.4GB of memory (VIRT in **top** ), and awk5, echo5, grep5 and sed5 ran at close to 100% CPU usage. Interestingly,
|
||||
rev5 ran at ca 30% CPU and tail5 at ca 15%.
|
||||
|
||||
To ensure that all 6 commands had done the same job, I did a **diff** on the 6 output files, each about 21 MB:
|
||||
|
||||
![][2]
|
||||
|
||||
### And the winner is...
|
||||
|
||||
Here are the elapsed times:
|
||||
|
||||
![][3]
|
||||
|
||||
Well, AWK (GNU AWK 4.1.4) is really fast. Sure, all 6 commands could process a 100-line file zippety-quick, but for big text-processing jobs, fire up your AWK.
|
||||
|
||||
### Commands used
|
||||
```
|
||||
awk '{print substr($0,length($0)-4,5)}' file > awk5
|
||||
```
|
||||
|
||||
awk5 used AWK's substring function. The function works on the whole line ($0), starts at the 4th character back from the last character (length($0)-4) and returns 5 characters (5).
|
||||
```
|
||||
#!/bin/bash
|
||||
while read line; do echo "${line: -5}"; done < file > echo5
|
||||
exit
|
||||
```
|
||||
|
||||
echo5 was run as a script and uses a **while** loop for processing one line at a time. The BASH string function "${line: -5}" returns the last 5 characters in "$line".
|
||||
```
|
||||
grep -o '.....$' file > grep5
|
||||
```
|
||||
|
||||
In grep5, **grep** searches each line for the last 5 characters (.....$) and returns (with the -o option) just that searched-for string.
|
||||
```
|
||||
#!/bin/bash
|
||||
while read line; do rev <<<"$line" | cut -c1-5 | rev; done < file > rev5
|
||||
exit
|
||||
```
|
||||
|
||||
The rev5 trick in this script has appeared often in online forums. Each line is first reversed with **rev** , then **cut** is used to return the first 5 characters, then the 5-character string is reversed with **rev**.
|
||||
```
|
||||
sed 's/.*\(.....\)/\1/' file > sed5
|
||||
```
|
||||
|
||||
sed5 is a simple use of **sed** (GNU sed 4.4) but was surprisingly slow in the trial. In each line, **sed** replaces zero or more characters leading up to the last 5 with just those last 5 (as a backreference).
|
||||
```
|
||||
#!/bin/bash
|
||||
while read line; do tail -c 6 <<<"$line"; done < file > tail5
|
||||
exit
|
||||
```
|
||||
|
||||
The "-c 6" in the tail5 script means that **tail** captures the last 5 characters in each line plus the newline character at the end.
|
||||
|
||||
Actually, the "-c" option captures bytes, not characters, meaning if the line ends in multi-byte characters the output will be corrupt. But would you really want to use the ultra-slow **tail** for this job in the first place?
|
||||
|
||||
### About the Author
|
||||
|
||||
Bob Mesibov is Tasmanian, retired and a keen Linux tinkerer.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: http://www.thelinuxrain.com/articles/bash-drivers-start-your-engines
|
||||
|
||||
作者:[Bob Mesibov][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:http://www.thelinuxrain.com
|
||||
[1]:http://www.thelinuxrain.com/articles/debian-9-on-a-dell-optiplex-9020-micro
|
||||
[2]:http://www.thelinuxrain.com/content/01-articles/201-bash-drivers-start-your-engines/1.png
|
||||
[3]:http://www.thelinuxrain.com/content/01-articles/201-bash-drivers-start-your-engines/2.png
|
Loading…
Reference in New Issue
Block a user