mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-13 22:30:37 +08:00
Submit Translated passage for review
Submit Translated passage for review
This commit is contained in:
parent
a156324237
commit
2d5ab22c31
@ -7,20 +7,20 @@
|
||||
[#]: via: (https://www.networkworld.com/article/3390204/how-to-identify-same-content-files-on-linux.html#tk.rss_all)
|
||||
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
|
||||
|
||||
How to identify same-content files on Linux
|
||||
如何在 Linux 上识别同样内容的文件
|
||||
======
|
||||
Copies of files sometimes represent a big waste of disk space and can cause confusion if you want to make updates. Here are six commands to help you identify these files.
|
||||
有时文件副本代表了对硬盘空间的巨大浪费并会在你想要更新文件时造成困扰。以下是用来识别这些文件的六个命令。
|
||||
![Vinoth Chandar \(CC BY 2.0\)][1]
|
||||
|
||||
In a recent post, we looked at [how to identify and locate files that are hard links][2] (i.e., that point to the same disk content and share inodes). In this post, we'll check out commands for finding files that have the same _content_ , but are not otherwise connected.
|
||||
在最近的帖子中,我们看了[如何识别定位硬链接的文件][2](换句话说,指向同一硬盘内容并共享索引节点)。在本篇帖子中,我们将查看能找到具有相同_内容_,却不相链接的文件的命令。
|
||||
|
||||
Hard links are helpful because they allow files to exist in multiple places in the file system while not taking up any additional disk space. Copies of files, on the other hand, sometimes represent a big waste of disk space and run some risk of causing some confusion if you want to make updates. In this post, we're going to look at multiple ways to identify these files.
|
||||
硬链接很有用时因为它们能够使文件存放在文件系统内的多个地方却不会占用额外的硬盘空间。另一方面,有时文件副本代表了对硬盘空间的巨大浪费,在你想要更新文件时也会有造成困扰之虞。在这篇帖子中,我们将看一下多种识别这些文件的方式。
|
||||
|
||||
**[ Two-Minute Linux Tips:[Learn how to master a host of Linux commands in these 2-minute video tutorials][3] ]**
|
||||
**[两分钟 Linux 小贴士:[学习如何通过两分钟视频教程掌握大量 Linux 命令][3]]**
|
||||
|
||||
### Comparing files with the diff command
|
||||
### 用 diff 命令比较文件
|
||||
|
||||
Probably the easiest way to compare two files is to use the **diff** command. The output will show you the differences between the two files. The < and > signs indicate whether the extra lines are in the first (<) or second (>) file provided as arguments. In this example, the extra lines are in backup.html.
|
||||
可能比较两个文件最简单的方法是使用 **diff** 命令。输出会显示你文件的不同之处。< 和 > 符号代表在当参数传过来的第一个(<)或第二个(>)文件中是否有额外的文字行。在这个例子中,在 backup.html 中有额外的文字行。
|
||||
|
||||
```
|
||||
$ diff index.html backup.html
|
||||
@ -30,18 +30,18 @@ $ diff index.html backup.html
|
||||
> </pre>
|
||||
```
|
||||
|
||||
If diff shows no output, that means the two files are the same.
|
||||
如果 diff 没有输出那代表两个文件相同。
|
||||
|
||||
```
|
||||
$ diff home.html index.html
|
||||
$
|
||||
```
|
||||
|
||||
The only drawbacks to diff are that it can only compare two files at a time, and you have to identify the files to compare. Some commands we will look at in this post can find the duplicate files for you.
|
||||
diff 的唯一缺点是它一次只能比较两个文件并且你必须指定用来比较的文件,这篇帖子中的一些命令可以为你找到多个重复文件。
|
||||
|
||||
### Using checksums
|
||||
### 使用 checksums
|
||||
|
||||
The **cksum** (checksum) command computes checksums for files. Checksums are a mathematical reduction of the contents to a lengthy number (like 2819078353 228029). While not absolutely unique, the chance that files that are not identical in content would result in the same checksum is extremely small.
|
||||
**cksum**(checksum) 命令计算文件的校验和。校验和是一种将文字内容转化成一个长数字(例如2819078353 228029)的数学简化。虽然并不是完全独特的,但是文件内容不同校验和却相同的概率微乎其微。
|
||||
|
||||
```
|
||||
$ cksum *.html
|
||||
@ -50,11 +50,11 @@ $ cksum *.html
|
||||
4073570409 227985 index.html
|
||||
```
|
||||
|
||||
In the example above, you can see how the second and third files yield the same checksum and can be assumed to be identical.
|
||||
在上述示例中,你可以看到产生同样校验和的第二个和第三个文件是如何可以被默认为相同的。
|
||||
|
||||
### Using the find command
|
||||
### 使用 find 命令
|
||||
|
||||
While the find command doesn't have an option for finding duplicate files, it can be used to search files by name or type and run the cksum command. For example:
|
||||
虽然 find 命令并没有寻找重复文件的选项,它依然可以被用来通过名字或类型寻找文件并运行 cksum 命令。例如:
|
||||
|
||||
```
|
||||
$ find . -name "*.html" -exec cksum {} \;
|
||||
@ -63,9 +63,9 @@ $ find . -name "*.html" -exec cksum {} \;
|
||||
4073570409 227985 ./index.html
|
||||
```
|
||||
|
||||
### Using the fslint command
|
||||
### 使用 fslint 命令
|
||||
|
||||
The **fslint** command can be used to specifically find duplicate files. Note that we give it a starting location. The command can take quite some time to complete if it needs to run through a large number of files. Here's output from a very modest search. Note how it lists the duplicate files and also looks for other issues, such as empty directories and bad IDs.
|
||||
**fslint** 命令可以被特地用来寻找重复文件。注意我们给了它一个起始位置。如果它需要遍历相当多的文件,这个命令需要花点时间来完成。注意它是如何列出重复文件并寻找其它问题的,比如空目录和坏ID。
|
||||
|
||||
```
|
||||
$ fslint .
|
||||
@ -86,15 +86,15 @@ index.html
|
||||
-------------------------Non Stripped executables
|
||||
```
|
||||
|
||||
You may have to install **fslint** on your system. You will probably have to add it to your search path, as well:
|
||||
你可能需要在你的系统上安装 **fslint**。 你可能也需要将它加入你的搜索路径:
|
||||
|
||||
```
|
||||
$ export PATH=$PATH:/usr/share/fslint/fslint
|
||||
```
|
||||
|
||||
### Using the rdfind command
|
||||
### 使用 rdfind 命令
|
||||
|
||||
The **rdfind** command will also look for duplicate (same content) files. The name stands for "redundant data find," and the command is able to determine, based on file dates, which files are the originals — which is helpful if you choose to delete the duplicates, as it will remove the newer files.
|
||||
**rdfind** 命令也会寻找重复(相同内容的)文件。它的名字代表“重复数据搜寻”并且它能够基于文件日期判断哪个文件是原件——这在你选择删除副本时很有用因为它会移除较新的文件。
|
||||
|
||||
```
|
||||
$ rdfind ~
|
||||
@ -111,7 +111,7 @@ Totally, 223 KiB can be reduced.
|
||||
Now making results file results.txt
|
||||
```
|
||||
|
||||
You can also run this command in "dryrun" (i.e., only report the changes that might otherwise be made).
|
||||
你可以在“dryrun”中运行这个命令 (换句话说,仅仅汇报可能会另外被做出的改动)。
|
||||
|
||||
```
|
||||
$ rdfind -dryrun true ~
|
||||
@ -128,7 +128,7 @@ Removed 9 files due to unique sizes from list.2 files left.
|
||||
(DRYRUN MODE) Now making results file results.txt
|
||||
```
|
||||
|
||||
The rdfind command also provides options for things such as ignoring empty files (-ignoreempty) and following symbolic links (-followsymlinks). Check out the man page for explanations.
|
||||
rdfind 命令同样提供了类似忽略空文档(-ignoreempty)和跟踪符号链接(-followsymlinks)的功能。查看 man 页面获取解释。
|
||||
|
||||
```
|
||||
-ignoreempty ignore empty files
|
||||
@ -146,7 +146,7 @@ The rdfind command also provides options for things such as ignoring empty files
|
||||
-n, -dryrun display what would have been done, but don't do it
|
||||
```
|
||||
|
||||
Note that the rdfind command offers an option to delete duplicate files with the **-deleteduplicates true** setting. Hopefully the command's modest problem with grammar won't irritate you. ;-)
|
||||
注意 rdfind 命令提供了 **-deleteduplicates true** 的设置选项以删除副本。希望这个命令语法上的小问题不会惹恼你。;-)
|
||||
|
||||
```
|
||||
$ rdfind -deleteduplicates true .
|
||||
@ -154,11 +154,11 @@ $ rdfind -deleteduplicates true .
|
||||
Deleted 1 files. <==
|
||||
```
|
||||
|
||||
You will likely have to install the rdfind command on your system. It's probably a good idea to experiment with it to get comfortable with how it works.
|
||||
你将可能需要在你的系统上安装 rdfind 命令。试验它以熟悉如何使用它可能是一个好主意。
|
||||
|
||||
### Using the fdupes command
|
||||
### 使用 fdupes 命令
|
||||
|
||||
The **fdupes** command also makes it easy to identify duplicate files and provides a large number of useful options — like **-r** for recursion. In its simplest form, it groups duplicate files together like this:
|
||||
**fdupes** 命令同样使得识别重复文件变得简单。它同时提供了大量有用的选项——例如用来迭代的**-r**。在这个例子中,它像这样将重复文件分组到一起:
|
||||
|
||||
```
|
||||
$ fdupes ~
|
||||
@ -173,7 +173,7 @@ $ fdupes ~
|
||||
/home/shs/hideme.png
|
||||
```
|
||||
|
||||
Here's an example using recursion. Note that many of the duplicate files are important (users' .bashrc and .profile files) and should clearly not be deleted.
|
||||
这是使用迭代的一个例子,注意许多重复文件是重要的(用户的 .bashrc 和 .profile 文件)并且不应被删除。
|
||||
|
||||
```
|
||||
# fdupes -r /home
|
||||
@ -204,7 +204,7 @@ Here's an example using recursion. Note that many of the duplicate files are imp
|
||||
/home/shs/PNGs/Sandra_rotated.png
|
||||
```
|
||||
|
||||
The fdupe command's many options are listed below. Use the **fdupes -h** command, or read the man page for more details.
|
||||
fdupe 命令的许多选项列在下面。使用 **fdupes -h** 命令或者阅读 man 页面获取详情。
|
||||
|
||||
```
|
||||
-r --recurse recurse
|
||||
@ -229,15 +229,14 @@ The fdupe command's many options are listed below. Use the **fdupes -h** command
|
||||
-h --help displays help
|
||||
```
|
||||
|
||||
The fdupes command is another one that you're like to have to install and work with for a while to become familiar with its many options.
|
||||
fdupes 命令是另一个你可能需要安装并使用一段时间才能熟悉其众多选项的命令。
|
||||
|
||||
### Wrap-up
|
||||
### 总结
|
||||
|
||||
Linux systems provide a good selection of tools for locating and potentially removing duplicate files, along with options for where you want to run your search and what you want to do with duplicate files when you find them.
|
||||
Linux 系统提供能够定位并(潜在地)能移除重复文件的一系列的好工具附带能让你指定搜索区域及当对你所发现的重复文件时的处理方式的选项。
|
||||
**[也可在:[解决 Linux 问题时的无价建议和技巧][4]上查看]**
|
||||
|
||||
**[ Also see:[Invaluable tips and tricks for troubleshooting Linux][4] ]**
|
||||
|
||||
Join the Network World communities on [Facebook][5] and [LinkedIn][6] to comment on topics that are top of mind.
|
||||
在 [Facebook][5] 和 [LinkedIn][6] 上加入 Network World 社区并对任何弹出的话题评论。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
@ -258,3 +257,4 @@ via: https://www.networkworld.com/article/3390204/how-to-identify-same-content-f
|
||||
[4]: https://www.networkworld.com/article/3242170/linux/invaluable-tips-and-tricks-for-troubleshooting-linux.html
|
||||
[5]: https://www.facebook.com/NetworkWorld/
|
||||
[6]: https://www.linkedin.com/company/network-world
|
||||
|
Loading…
Reference in New Issue
Block a user