mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-13 22:30:37 +08:00
翻译完成
This commit is contained in:
parent
4851150f35
commit
62b9888b34
@ -1,127 +0,0 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (MjSeven)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (How to identify duplicate files on Linux)
|
||||
[#]: via: (https://www.networkworld.com/article/3387961/how-to-identify-duplicate-files-on-linux.html#tk.rss_all)
|
||||
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
|
||||
|
||||
How to identify duplicate files on Linux
|
||||
======
|
||||
Some files on a Linux system can appear in more than one location. Follow these instructions to find and identify these "identical twins" and learn why hard links can be so advantageous.
|
||||
![Archana Jarajapu \(CC BY 2.0\)][1]
|
||||
|
||||
Identifying files that share disk space relies on making use of the fact that the files share the same inode — the data structure that stores all the information about a file except its name and content. If two or more files have different names and file system locations, yet share an inode, they also share content, ownership, permissions, etc.
|
||||
|
||||
These files are often referred to as "hard links" — unlike symbolic links that simply point to other files by containing their names. Symbolic links are easy to pick out in a file listing by the "l" in the first position and **- >** symbol that refers to the file being referenced.
|
||||
|
||||
```
|
||||
$ ls -l my*
|
||||
-rw-r--r-- 4 shs shs 228 Apr 12 19:37 myfile
|
||||
lrwxrwxrwx 1 shs shs 6 Apr 15 11:18 myref -> myfile
|
||||
-rw-r--r-- 4 shs shs 228 Apr 12 19:37 mytwin
|
||||
```
|
||||
|
||||
Identifying hard links in a single directory is not as obvious, but it is still quite easy. If you list the files using the **ls -i** command and sort them by inode number, you can pick out the hard links fairly easily. In this type of ls output, the first column shows the inode numbers.
|
||||
|
||||
```
|
||||
$ ls -i | sort -n | more
|
||||
...
|
||||
788000 myfile <==
|
||||
788000 mytwin <==
|
||||
801865 Name_Labels.pdf
|
||||
786692 never leave home angry
|
||||
920242 NFCU_Docs
|
||||
800247 nmap-notes
|
||||
```
|
||||
|
||||
Scan your output looking for identical inode numbers and any matches will tell you what you want to know.
|
||||
|
||||
**[ Also see:[Invaluable tips and tricks for troubleshooting Linux][2] ]**
|
||||
|
||||
If, on the other hand, you simply want to know if one particular file is hard-linked to another file, there's an easier way than scanning through a list of what may be hundreds of files. The find command's **-samefile** option will do the work for you.
|
||||
|
||||
```
|
||||
$ find . -samefile myfile
|
||||
./myfile
|
||||
./save/mycopy
|
||||
./mytwin
|
||||
```
|
||||
|
||||
Notice that the starting location provided to the find command will determine how much of the file system is scanned for matches. In the above example, we're looking in the current directory and subdirectories.
|
||||
|
||||
Adding output details using find's **-ls** option might be more convincing:
|
||||
|
||||
```
|
||||
$ find . -samefile myfile -ls
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 ./myfile
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 ./save/mycopy
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 ./mytwin
|
||||
```
|
||||
|
||||
The first column shows the inode number. Then we see file permissions, links, owner, file size, date information, and the names of the files that refer to the same disk content. Note that the link field in this case is a "4" not the "3" we might expect, telling us that there's another link to this same inode as well (but outside our search range).
|
||||
|
||||
If you want to look for all instances of hard links in a single directory, you could try a script like this that will create the list and look for the duplicates for you:
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
# seaches for files sharing inodes
|
||||
|
||||
prev=""
|
||||
|
||||
# list files by inode
|
||||
ls -i | sort -n > /tmp/$0
|
||||
|
||||
# search through file for duplicate inode #s
|
||||
while read line
|
||||
do
|
||||
inode=`echo $line | awk '{print $1}'`
|
||||
if [ "$inode" == "$prev" ]; then
|
||||
grep $inode /tmp/$0
|
||||
fi
|
||||
prev=$inode
|
||||
done < /tmp/$0
|
||||
|
||||
# clean up
|
||||
rm /tmp/$0
|
||||
|
||||
$ ./findHardLinks
|
||||
788000 myfile
|
||||
788000 mytwin
|
||||
```
|
||||
|
||||
You can also use the find command to look for files by inode number as in this command. However, this search could involve more than one file system, so it is possible that you will get false results, since the same inode number might be used in another file system where it would not represent the same file. If that's the case, other file details will not be identical.
|
||||
|
||||
```
|
||||
$ find / -inum 788000 -ls 2> /dev/null
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /tmp/mycopy
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /home/shs/myfile
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /home/shs/save/mycopy
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /home/shs/mytwin
|
||||
```
|
||||
|
||||
Note that error output was shunted off to /dev/null so that we didn't have to look at all the "Permission denied" errors that would have otherwise been displayed for other directories that we're not allowed to look through.
|
||||
|
||||
Also, scanning for files that contain the same content but don't share inodes (i.e., simply file copies) would take considerably more time and effort.
|
||||
|
||||
Join the Network World communities on [Facebook][3] and [LinkedIn][4] to comment on topics that are top of mind.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.networkworld.com/article/3387961/how-to-identify-duplicate-files-on-linux.html#tk.rss_all
|
||||
|
||||
作者:[Sandra Henry-Stocker][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://www.networkworld.com/author/Sandra-Henry_Stocker/
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://images.idgesg.net/images/article/2019/04/reflections-candles-100793651-large.jpg
|
||||
[2]: https://www.networkworld.com/article/3242170/linux/invaluable-tips-and-tricks-for-troubleshooting-linux.html
|
||||
[3]: https://www.facebook.com/NetworkWorld/
|
||||
[4]: https://www.linkedin.com/company/network-world
|
@ -0,0 +1,124 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (MjSeven)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (How to identify duplicate files on Linux)
|
||||
[#]: via: (https://www.networkworld.com/article/3387961/how-to-identify-duplicate-files-on-linux.html#tk.rss_all)
|
||||
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
|
||||
|
||||
如何识别 Linux 上的重复文件
|
||||
======
|
||||
Linux 系统上的一些文件可能出现在多个位置。按照本文指示查找并识别这些“同卵双胞胎”,还可以了解为什么硬链接会如此有利。
|
||||
![Archana Jarajapu \(CC BY 2.0\)][1]
|
||||
|
||||
识别共享磁盘空间的文件依赖于利用文件共享相同的 `inode` 这一事实。这种数据结构存储除了文件名和内容之外的所有信息。如果两个或多个文件具有不同的名称和文件系统位置,但共享一个 inode,则它们还共享内容、所有权、权限等。
|
||||
|
||||
这些文件通常被称为“硬链接”,不像符号链接(即软链接)那样仅仅通过包含它们的名称指向其他文件,符号链接很容易在文件列表中通过第一个位置的 “l” 和引用文件的 **->** 符号识别出来。
|
||||
|
||||
```
|
||||
$ ls -l my*
|
||||
-rw-r--r-- 4 shs shs 228 Apr 12 19:37 myfile
|
||||
lrwxrwxrwx 1 shs shs 6 Apr 15 11:18 myref -> myfile
|
||||
-rw-r--r-- 4 shs shs 228 Apr 12 19:37 mytwin
|
||||
```
|
||||
|
||||
识别单个目录中的硬链接并不是很明显,但它仍然非常容易。如果使用 **ls -i** 命令列出文件并按 `inode` 编号排序,则可以非常容易地挑选出硬链接。在这种类型的 `ls` 输出中,第一列显示 `inode` 编号。
|
||||
|
||||
```
|
||||
$ ls -i | sort -n | more
|
||||
...
|
||||
788000 myfile <==
|
||||
788000 mytwin <==
|
||||
801865 Name_Labels.pdf
|
||||
786692 never leave home angry
|
||||
920242 NFCU_Docs
|
||||
800247 nmap-notes
|
||||
```
|
||||
|
||||
扫描输出,查找相同的 `inode` 编号,任何匹配都会告诉你想知道的内容。
|
||||
|
||||
**[另请参考:[Linux 疑难解答的宝贵提示和技巧][2]]**
|
||||
|
||||
另一方面,如果你只是想知道某个特定文件是否是另一个文件的硬链接,那么有一种方法比浏览数百个文件的列表更简单,即 `find` 命令的 **-samefile** 选项将帮助你完成工作。
|
||||
```
|
||||
$ find . -samefile myfile
|
||||
./myfile
|
||||
./save/mycopy
|
||||
./mytwin
|
||||
```
|
||||
|
||||
注意,提供给 `find` 命令的起始位置决定文件系统会扫描多少来进行匹配。在上面的示例中,我们正在查看当前目录和子目录。
|
||||
|
||||
使用 find 的 **-ls** 选项添加输出的详细信息可能更有说服力:
|
||||
```
|
||||
$ find . -samefile myfile -ls
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 ./myfile
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 ./save/mycopy
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 ./mytwin
|
||||
```
|
||||
|
||||
第一列显示 `inode` 编号,然后我们会看到文件权限、链接、所有者、文件大小、日期信息以及引用相同磁盘内容的文件的名称。注意,在这种情况下,`link` 字段是 “4” 而不是我们可能期望的 “3”。这告诉我们还有另一个指向同一个 `inode` 的链接(但不在我们的搜索范围内)。
|
||||
|
||||
如果你想在一个目录中查找所有硬链接的实例,可以尝试以下的脚本来创建列表并为你查找副本:
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
# seaches for files sharing inodes
|
||||
|
||||
prev=""
|
||||
|
||||
# list files by inode
|
||||
ls -i | sort -n > /tmp/$0
|
||||
|
||||
# search through file for duplicate inode #s
|
||||
while read line
|
||||
do
|
||||
inode=`echo $line | awk '{print $1}'`
|
||||
if [ "$inode" == "$prev" ]; then
|
||||
grep $inode /tmp/$0
|
||||
fi
|
||||
prev=$inode
|
||||
done < /tmp/$0
|
||||
|
||||
# clean up
|
||||
rm /tmp/$0
|
||||
|
||||
$ ./findHardLinks
|
||||
788000 myfile
|
||||
788000 mytwin
|
||||
```
|
||||
|
||||
你还可以使用 `find` 命令按 `inode` 编号查找文件,如命令中所示。但是,此搜索可能涉及多个文件系统,因此可能会得到错误的结果。因为相同的 `inode` 编号可能会在另一个文件系统中使用,代表另一个文件。如果是这种情况,文件的其他详细信息将不相同。
|
||||
|
||||
```
|
||||
$ find / -inum 788000 -ls 2> /dev/null
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /tmp/mycopy
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /home/shs/myfile
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /home/shs/save/mycopy
|
||||
788000 4 -rw-r--r-- 4 shs shs 228 Apr 12 19:37 /home/shs/mytwin
|
||||
```
|
||||
|
||||
注意,错误输出被重定向到 `/dev/null`,这样我们就不必查看所有 "Permission denied" 错误,否则这些错误将显示在我们不允许查看的其他目录中。
|
||||
|
||||
此外,扫描包含相同内容但不共享 `inode` 的文件(即,简单的文本拷贝)将花费更多的时间和精力。
|
||||
|
||||
加入 [Facebook][3] 和 [LinkedIn][4] 上的网络世界社区,对重要的话题发表评论。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.networkworld.com/article/3387961/how-to-identify-duplicate-files-on-linux.html#tk.rss_all
|
||||
|
||||
作者:[Sandra Henry-Stocker][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[MjSeven](https://github.com/MjSeven)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://www.networkworld.com/author/Sandra-Henry_Stocker/
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://images.idgesg.net/images/article/2019/04/reflections-candles-100793651-large.jpg
|
||||
[2]: https://www.networkworld.com/article/3242170/linux/invaluable-tips-and-tricks-for-troubleshooting-linux.html
|
||||
[3]: https://www.facebook.com/NetworkWorld/
|
||||
[4]: https://www.linkedin.com/company/network-world
|
Loading…
Reference in New Issue
Block a user