translated

This commit is contained in:
geekpi 2022-11-15 08:36:56 +08:00
parent 78080fbdbd
commit 1f5756beaa
2 changed files with 91 additions and 91 deletions

View File

@ -1,91 +0,0 @@
[#]: subject: "Fix scanned images with ImageMagick"
[#]: via: "https://opensource.com/article/22/11/fixing-scanned-images-imagemagick"
[#]: author: "Seth Kenlon https://opensource.com/users/seth"
[#]: collector: "lkxed"
[#]: translator: "geekpi"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
Fix scanned images with ImageMagick
======
It's easy to correct images, even in batches, with this open source tool.
Years ago while rummaging through the contents of a shelf in a used bookstore, I happened upon a booklet titled "UNIX System Command Summary for Berkeley 4.2 & 4.3 BSD," published by Specialized Systems Consultants. I bought it as a curiosity item because it was nearly 20 years old yet still largely applicable to modern Linux and BSD.
That amused me then and now. A booklet written in 1986 was still largely relevant in 2016, while books on the same shelf about a proprietary OS weren't worth the paper they were printed on. (Think about it: What technology do you think is going to survive a zombie apocalypse?) I've had the booklet on my own bookshelf for several years now, but it occurred to me that it's probably worth doing a little digital preservation of this artifact, so I decided to scan the booklet to create a [CBZ ebook][1].
Scanning was easy, albeit time-consuming, with [Skanlite][2]. After I was finished, however, I discovered that some pages weren't quite level.
![A page of text, including a table of contents and a glossary, that is crooked and distorted][3]
In printing, this is called a registration problem, meaning that the position of what's being printed isn't correctly orientated on the page.
### ImageMagick
[ImageMagick][4] is a non-interactive terminal-based graphics editor. It might seem counterintuitive to try to edit a graphic in a graphic-less environment like a text-only terminal, but it's actually very common. For instance, when you upload an image to use as a profile picture to a web application, it's likely that a script on the application's server processes your image using ImageMagick or its libraries. The advantage of a non-interactive editor is that you can formulate what needs to be done to a sample image, then apply those effects to hundreds of other images at the press of a button.
ImageMagick is generally just as capable as any graphics editor, as long as you take the time to uncover its many functions and how to combine them to achieve the desired effects. In this case, I want to rotate pages that are askew. After searching through ImageMagick's documentation, I discovered that the ImageMagick term for the solution I needed was called deskew. Aligning your terminology with somebody else's terminology is a challenge in anything that you don't already know, so when you approach ImageMagick (or anything), keep in mind that the word you've decided describes a problem or solution may not be the same word used by someone else.
To deskew an image with crooked text using ImageMagick:
```
$ convert page_0052.webp -deskew25% fix_0052.webp
```
The `-deskew` option represents the threshold of acceptable skew. A skew is determined by tracing peaks and valleys of objects that appear to be letters. Depending on how crooked your scan is, you may need more or less than 25% threshold. I've gone as high as 80%, and so far nothing under 25% has had an effect.
Here's the result:
![The same page of text, now with the text properly aligned][5]
Fixed! Applying this to the remaining 55 pages of the document fixed skewed pages while doing nothing to pages that were already straight. In other words, it was safe to run this command on pages that needed no adjustment, thanks to my threshold setting.
### Cropping an image with ImageMagick
After correcting for a skew, and because I scanned more of each page than necessary anyway to prevent accidentally cutting off words, I decided that it made sense to crop my corrected pages. I was happy to keep some space around the margins, but not quite as much as I had. I use the `crop` function of ImageMagick often enough for images on this very website, so I was familiar with the option. However, I needed to determine how to crop each page.
First, I needed the size of the image:
```
$ identify fixed_0052.webp
WEBP 1128x2593 1128x2593+0+08-bit sRGB 114732B 0.020u 0:00.021
```
Knowing the size, I was able to make some estimations about how many pixels I could stand to lose. After a few trial runs, I came up with this:
```
convert fix_0052.webp -gravity Center -crop 950x2450+0+0 crop_0052.webp
```
This isn't an exact fit, but it proved important when I applied it to other images in the booklet. The pages varied in content and scanner placement here and there, so I was happy to give each one a little breathing room.
Here's the corrected and cropped image:
![The same page of text, with the previous fixes applied and crooked white margins around the page cropped out.][6]
### Batch image editing with open source
The beauty of ImageMagick is that once you've figured out the formula for fixing your image, you can apply that fix to all images requiring the same fix. I do this with [GNU Parallel][7], which uses all my CPU cores to finish image correction across hundreds of pages. It doesn't take long, and the results speak for themselves. More importantly, I've got a digital archive of a fun artifact of UNIX history.
--------------------------------------------------------------------------------
via: https://opensource.com/article/22/11/fixing-scanned-images-imagemagick
作者:[Seth Kenlon][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/seth
[b]: https://github.com/lkxed
[1]: https://opensource.com/article/19/3/comic-book-archive-djvu
[2]: https://opensource.com/article/22/2/scan-documents-skanlite-linux-kde
[3]: https://opensource.com/sites/default/files/2022-10/imagemagick-crook_1.png
[4]: https://opensource.com/article/17/8/imagemagick
[5]: https://opensource.com/sites/default/files/2022-10/imagemagick-deskew-fix.png
[6]: https://opensource.com/sites/default/files/2022-10/imagemagick-deskew-crop.png
[7]: http://LINK-TO-SETH-GNU-PARALLEL-REDHAT.COM/SYSADMIN

View File

@ -0,0 +1,91 @@
[#]: subject: "Fix scanned images with ImageMagick"
[#]: via: "https://opensource.com/article/22/11/fixing-scanned-images-imagemagick"
[#]: author: "Seth Kenlon https://opensource.com/users/seth"
[#]: collector: "lkxed"
[#]: translator: "geekpi"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
使用 ImageMagick 修复扫描图像
======
使用这个开源工具,即使是批量校正图像也很容易。
多年前,在翻阅一家旧书店的书架上的内容时,我偶然发现了一本名为 《UNIX System Command Summary for Berkeley 4.2 & 4.3 BSD》 的小册子,由 Specialized Systems Consultants 出版。我买它是出于好奇,因为它已经有将近 20 年的历史了,但仍然在很大程度上适用于现代 Linux 和 BSD。
这让我当时和现在都很开心。一本写于 1986 年的小册子在 2016 年仍然很重要,而同一个书架上关于专有操作系统的书籍并不值得印刷它们的纸张。(想一想:你认为什么技术可以在僵尸末日中幸存下来?)这本小册子已经放在我自己的书架上好几年了,但我突然想到可能值得对这个作品做一点数字保存,所以我决定扫描这本小册子来创建一本 [CBZ 电子书][1]。
使用 [Skanlite][2] 进行扫描很容易,但很耗时。然而,当我完成后,我发现有些页面不是很平整。
![A page of text, including a table of contents and a glossary, that is crooked and distorted][3]
在打印中,这称为配准问题,这意味着打印内容的位置在页面上的方向不正确。
### ImageMagick
[ImageMagick][4] 是基于终端的非交互式图形编辑器。尝试在无图形环境(如纯文本终端)中编辑图形似乎违反直觉,但实际上很常见。例如,当你将图像上传到 Web 应用用作个人资料图片时,应用服务器上的脚本可能会使用 ImageMagick 或其库处理你的图像。非交互式编辑器的优点是你可以制定需要对示例图像执行的操作,然后只需按一下按钮即可将这些效果应用于数百个其他图像。
ImageMagick 通常与其他图形编辑器一样强大,只要你花时间了解它的许多功能以及如何组合它们以实现所需的效果。在这种情况下,我想旋转歪斜的页面。在搜索了 ImageMagick 的文档后,我发现我需要的解决方案的 ImageMagick 术语称为纠偏。将你的术语与其他人的术语保持一致对于你不知道的任何事情都是一个挑战,因此当你使用 ImageMagick或其他任何东西请记住你描述问题或解决方案的用词可能和别人不一样。
要使用 ImageMagick 对带有弯曲文本的图像进行校正:
```
$ convert page_0052.webp -deskew25% fix_0052.webp
```
`-deskew` 选项表示可接受偏差的阈值。通过跟踪看似字母的对象的峰谷来确定倾斜。根据扫描的弯曲程度,你可能需要多于或少于 25% 的阈值。我已经达到了 80%,到目前为止,低于 25% 没用效果。
结果如下:
![The same page of text, now with the text properly aligned][5]
修复了!将其应用于文档的剩余 55 页以修复倾斜的页面,而对已经笔直的页面不做任何事情。换句话说,由于我的阈值设置,在不需要调整的页面上运行此命令是安全的。
### 使用 ImageMagick 裁剪图像
在纠正了歪斜之后,因为无论如何我扫描的每一页都比必要的要多,以防止意外切断单词,我认为裁剪我纠正的页面是有意义的。我很高兴在页边空白处保留一些空间,但没有以前那么多。我经常使用 ImageMagick 的“裁剪”功能来处理这个网站上的图像,所以我很熟悉这个选项。但是,我需要确定如何裁剪每一页。
首先,我需要图像的大小:
```
$ identify fixed_0052.webp
WEBP 1128x2593 1128x2593+0+08-bit sRGB 114732B 0.020u 0:00.021
```
知道尺寸后,我能够对我可以承受的丢失多少像素做出一些估计。经过几次试运行,我得到了这个:
```
convert fix_0052.webp -gravity Center -crop 950x2450+0+0 crop_0052.webp
```
这并不完全适合,但当我将它应用于册子中的其他图像时,它被证明很重要。这些页面的内容和扫描仪位置各不相同,所以我很高兴给每一页一点空余空间。
这是校正和裁剪的图像:
![The same page of text, with the previous fixes applied and crooked white margins around the page cropped out.][6]
### 使用开源批量编辑图像
ImageMagick 的美妙之处在于,当你确定了修复图像的公式,你就可以将该修复应用于需要相同修复的所有图像。我使用 [GNU Parallel][7] 执行此操作,它使用我所有的 CPU 内核来完成数百页的图像校正。这并不需要很长时间,而且结果不言而喻。更重要的是,我已经有了一个 UNIX 历史上有趣作品的数字档案。
--------------------------------------------------------------------------------
via: https://opensource.com/article/22/11/fixing-scanned-images-imagemagick
作者:[Seth Kenlon][a]
选题:[lkxed][b]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/seth
[b]: https://github.com/lkxed
[1]: https://opensource.com/article/19/3/comic-book-archive-djvu
[2]: https://opensource.com/article/22/2/scan-documents-skanlite-linux-kde
[3]: https://opensource.com/sites/default/files/2022-10/imagemagick-crook_1.png
[4]: https://opensource.com/article/17/8/imagemagick
[5]: https://opensource.com/sites/default/files/2022-10/imagemagick-deskew-fix.png
[6]: https://opensource.com/sites/default/files/2022-10/imagemagick-deskew-crop.png
[7]: http://LINK-TO-SETH-GNU-PARALLEL-REDHAT.COM/SYSADMIN