Merge pull request #2227 from geekpi/master

translated
This commit is contained in:
geekpi 2015-01-11 11:15:35 +08:00
commit 17e6f0a763
2 changed files with 76 additions and 78 deletions

View File

@ -1,78 +0,0 @@
Translating-----geekpi
How to deduplicate files on Linux with dupeGuru
================================================================================
Recently, I was given the task to clean up my father's files and folders. What made it difficult was the abnormal amount of duplicate files with incorrect names. By keeping a backup on an external drive, simultaneously editing multiple versions of the same file, or even changing the directory structure, the same file can get copied many times, change names, change locations, and just clog disk space. Hunting down every single one of them can become a problem of gigantic proportions. Hopefully, there exists nice little software that can save your precious hours by finding and removing duplicate files on your system: [dupeGuru][1]. Written in Python, this file deduplication software switched to a GPLv3 license a few hours ago. So time to apply your new year's resolutions and clean up your stuff!
### Installation of dupeGuru ###
On Ubuntu, you can add the Hardcoded Software PPA:
$ sudo apt-add-repository ppa:hsoft/ppa
$ sudo apt-get update
And then install with:
$ sudo apt-get install dupeguru-se
On Arch Linux, the package is present in the [AUR][2].
If you prefer compiling it yourself, the sources are on [GitHub][3].
### Basic Usage of dupeGuru ###
DupeGuru is conceived to be fast and safe. Which means that the program is not going to run berserk on your system. It has a very low risk of deleting stuff that you did not intend to delete. However, as we are still talking about file deletion, it is always a good idea to stay vigilant and cautious: a good backup is always necessary.
Once you took your precautions, you can launch dupeGuru via the command:
$ dupeguru_se
You should be greeted by the folder selection screen, where you can add folders to scan for deduplication.
![](https://farm9.staticflickr.com/8596/16199976251_f78b042fba.jpg)
Once you selected your directories and launched the scan, dupeGuru will show its results by grouping duplicate files together in a list.
![](https://farm9.staticflickr.com/8600/16016041367_5ab2834efb_z.jpg)
Note that by default dupeGuru matches files based on their content, and not their name. To be sure that you do not accidentally delete something important, the match column shows you the accuracy of the matching algorithm. From there, you can select the duplicate files that you want to take action on, and click on "Actions" button to see available actions.
![](https://farm8.staticflickr.com/7516/16199976361_c8f919b06e_b.jpg)
The choice of actions is quite extensive. In short, you can delete the duplicates, move them to another location, ignore them, open them, rename them, or even invoke a custom command on them. If you choose to delete a duplicate, you might get as pleasantly surprised as I was by available deletion options.
![](https://farm8.staticflickr.com/7503/16014366568_54f70e3140.jpg)
You can not only send the duplicate files to the trash or delete them permanently, but you can also choose to leave a link to the original file (either using a symlink or a hardlink). In oher words, the duplicates will be erased, and a link to the original will be left instead, saving a lot of disk space. This can be particularly useful if you imported those files into a workspace, or have dependencies based on them.
Another fancy option: you can export the results to a HTML or CSV file. Not really sure why you would do that, but I suppose that it can be useful if you prefer keeping track of duplicates rather than use any of dupeGuru's actions on them.
Finally, last but not least, the preferences menu will make all your dream about duplicate busting come true.
![](https://farm8.staticflickr.com/7493/16015755749_a9f343b943_z.jpg)
There you can select the criterion for the scan, either content based or name based, and a threshold for duplicates to control the number of results. It is also possible to define the custom command that you can select in the actions. Among the myriad of other little options, it is good to notice that by default, dupeGuru ignores files less than 10KB.
For more information, I suggest that you go check out the [official website][4], which is filled with documention, support forums, and other goodies.
To conclude, dupeGuru is my go-to software whenever I have to prepare a backup or to free some space. I find it powerful enough for advanced users, and yet intuitive to use for newcomers. Cherry on the cake: dupeGuru is cross platform, which means that you can also use it for your Mac or Windows PC. If you have specific needs, and want to clean up music or image files, there exists two variations: [dupeguru-me][5] and [dupeguru-pe][6], which respectively find duplicate audio tracks and pictures. The main difference from the regular version is that it compares beyond file formats and takes into account specific media meta-data like quality and bit-rate.
What do you think of dupeGuru? Would you consider using it? Or do you have any alternative deduplication software to suggest? Let us know in the comments.
--------------------------------------------------------------------------------
via: http://xmodulo.com/dupeguru-deduplicate-files-linux.html
作者:[Adrien Brochard][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[a]:http://xmodulo.com/author/adrien
[1]:http://www.hardcoded.net/dupeguru/
[2]:https://aur.archlinux.org/packages/dupeguru-se/
[3]:https://github.com/hsoft/dupeguru
[4]:http://www.hardcoded.net/dupeguru/
[5]:http://www.hardcoded.net/dupeguru_me/
[6]:http://www.hardcoded.net/dupeguru_pe/

View File

@ -0,0 +1,76 @@
如何在Linux上使用dupeGuru删除重复文件
================================================================================
最近,我被要求清理我父亲的文件和文件夹。有一个难题是,里面存在很多不正确的名字的重复文件。有移动硬盘的备份,同时还为同一个文件编辑了多个版本,甚至改变的目录结构,同一个文件被复制了好几次,改变名字,改变位置等,这些挤满了磁盘空间。追踪每一个文件成了一个最大的问题。万幸的是,有一个小巧的软件可以帮助你省下很多时间来找到删除你系统中重复的文件:[dupeGuru][1]。它用Python写成这个去重软件几个小时钱前切换到了GPLv3许可证。因此是时候用它来清理你的文件了
### dupeGuru的安装 ###
在Ubuntu上 你可以加入Hardcoded的软件PPA
$ sudo apt-add-repository ppa:hsoft/ppa
$ sudo apt-get update
接着用下面的命令安装:
$ sudo apt-get install dupeguru-se
在ArchLinux中这个包在[AUR][2]中。
如果你想自己编译,源码在[GitHub][3]上。
### dupeGuru的基本使用 ###
DupeGuru的构想是既快又安全。这意味着程序不会在你的系统上疯狂地运行。它很少会删除你不想要删除的文件。然而既然在讨论文件删除保持谨慎和小心总是好的备份总是需要的。
你看完注意事项后你可以用下面的命令运行duprGuru了
$ dupeguru_se
你应该看到要你选择文件夹的欢迎界面,在这里加入你你想要扫描的重复文件夹。
![](https://farm9.staticflickr.com/8596/16199976251_f78b042fba.jpg)
一旦你选择完文件夹并启动扫描后dupeFuru会以列表的形式显示重复文件的组
![](https://farm9.staticflickr.com/8600/16016041367_5ab2834efb_z.jpg)
注意的是默认上dupeGuru基于文件的内容匹配而不是他们的名字。为了防止意外地删除了重要的文件匹配那列列出了使用的匹配算法。在这里你可以选择你想要删除的匹配文件并按下“Action” 按钮来看到可用的操作。
![](https://farm8.staticflickr.com/7516/16199976361_c8f919b06e_b.jpg)
可用的选项是相当广泛的。简而言之,你可以删除重复、移动到另外的位置、忽略它们、打开它们、重命名它们甚至用自定义命令运行它们。如果你选择删除重复文件,你可能会像我一样非常意外竟然还有删除选项。
![](https://farm8.staticflickr.com/7503/16014366568_54f70e3140.jpg)
你不及可以将删除文件移到垃圾箱或者永久删除,还可以选择留下指向原文件的链接(软链接或者硬链接)。也就是说,重复文件按将会删除但是会保留下指向原文件的链接。这将会省下大量的磁盘空间。如果你将这些文件导入到工作空间或者它们有一些依赖时很有用。
还有一个奇特的选项你可以用HTML或者CSV文件导出结果。不确定你会不会这么做但是我假设你想追踪重复文件而不是想让dupeGuru处理它们时会有用。
最后但并不是最不重要的是,偏好菜单可以让你对去重的想法成真。
![](https://farm8.staticflickr.com/7493/16015755749_a9f343b943_z.jpg)
这里你可以选择扫描的标准基于内容还是基于文字并且有一个阈值来控制结果的数量。这里同样可以定义自定义在执行中可以选择的命令。在无数其他小的选项中要注意的是dupeGuru默认忽略小于10KB的文件。
要了解更多的信息,我建议你到[official website][4]官方网站看下,这里有很多文档、论坛支持和其他好东西。
总结一下dupeGuru是我无论何时准备备份或者释放空间时想到的软件。我发现这对高级用户而言也足够强大了对新人而言也很直观。锦上添花的是dupeGuru是跨平台的额这意味着你可以在Mac或者在Windows PC上都可以使用。如果你有特定的需求想要清理音乐或者图片。这里有两个变种[dupeguru-me][5]和 [dupeguru-pe][6] 相应地可以清理音频和图片文件。与常规版本的不同是它不仅比较文件格式还比较特定的媒体数据像质量和码率。
你dupeGuru怎么样你会考虑使用它么或者你有任何可以替代的软件的建议么让我在评论区知道你们的想法。
--------------------------------------------------------------------------------
via: http://xmodulo.com/dupeguru-deduplicate-files-linux.html
作者:[Adrien Brochard][a]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[a]:http://xmodulo.com/author/adrien
[1]:http://www.hardcoded.net/dupeguru/
[2]:https://aur.archlinux.org/packages/dupeguru-se/
[3]:https://github.com/hsoft/dupeguru
[4]:http://www.hardcoded.net/dupeguru/
[5]:http://www.hardcoded.net/dupeguru_me/
[6]:http://www.hardcoded.net/dupeguru_pe/