translated

2025-03-27 02:30:10 +08:00 · 2021-03-15 09:08:06 +08:00 · 2021-03-15 09:08:06 +08:00 · 388af9c19f
commit 388af9c19f
parent 44df792c04
2 changed files with 105 additions and 105 deletions
--- a/sources/tech/20210309
+++ b/sources/tech/20210309
@ -1,105 +0,0 @@
-[#]: subject: (Use gImageReader to Extract Text From Images and PDFs on Linux)
-[#]: via: (https://itsfoss.com/gimagereader-ocr/)
-[#]: author: (Ankush Das https://itsfoss.com/author/ankush/)
-[#]: collector: (lujun9972)
-[#]: translator: (geekpi)
-[#]: reviewer: ( )
-[#]: publisher: ( )
-[#]: url: ( )
-
-Use gImageReader to Extract Text From Images and PDFs on Linux
-======
-
-_Brief: gImageReader is a GUI tool to utilize tesseract OCR engine for extracting texts from images and PDF files in Linux._
-
-[gImageReader][1] is a front-end for [Tesseract Open Source OCR Engine][2]. _Tesseract_ was originally developed at HP and then was open-sourced in 2006.
-
-Basically, the OCR (Optical Character Recognition) engine lets you scan texts from a picture or a file (PDF). It can detect several languages by default and also supports scanning through Unicode characters.
-
-However, the Tesseract by itself is a command-line tool without any GUI. So, here, gImageReader comes to the rescue to let any user utilize it to extract text from images and files.
-
-Let me highlight a few things about it while mentioning my experience with it for the time I tested it out.
-
-### gImageReader: A Cross-Platform Front-End to Tesseract OCR
-
-![][3]
-
-To simplify things, gImageReader comes in handy to extract text from a PDF file or an image that contains any kind of text.
-
-Whether you need it for spellcheck or translation, it should be useful for a specific group of users.
-
-To sum up the features in a list, here’s what you can do with it:
-
-  * Add PDF documents and images from disk, scanning devices, clipboard and screenshots
-  * Ability to rotate images
-  * Common image controls to adjust brightness, contrast, and resolution
-  * Scan images directly through the app
-  * Ability to process multiple images or files in one go
-  * Manual or automatic recognition area definition
-  * Recognize to plain text or to [hOCR][4] documents
-  * Editor to display the recognized text
-  * Can spellcheck the text extracted
-  * Convert/Export to PDF documents from hOCR document
-  * Export extracted text as a .txt file
-  * Cross-platform (Windows)
-
-
-
-### Installing gImageReader on Linux
-
-**Note**: _You need to explicitly install Tesseract language packs to detect from images/files from your software manager._
-
-![][5]
-
-You can find gImageReader in the default repositories for some Linux distributions like Fedora and Debian.
-
-For Ubuntu, you need to add a PPA and then install it. To do that, here’s what you need to type in the terminal:
-
-```
-sudo add-apt-repository ppa:sandromani/gimagereader
-sudo apt update
-sudo apt install gimagereader
-```
-
-You can also find it for openSUSE from its build service and [AUR][6] will be the place for Arch Linux users.
-
-All the links to the repositories and the packages can be found in their [GitHub page][1].
-
-[gImageReader][1]
-
-### Experience with gImageReader
-
-gImageReader is a quite useful tool for extracting texts from images when you need them. It works great when you try from a PDF file.
-
-For extracting images from a picture shot on a smartphone, the detection was close but a bit inaccurate. Maybe when you scan something, recognition of characters from the file could be better.
-
-So, you’ll have to try it for yourself to see how well it works for your use-case. I tried it on Linux Mint 20.1 (based on Ubuntu 20.04).
-
-I just had an issue to manage languages from the settings and I didn’t get a quick solution for that. If you encounter the issue, you might want to troubleshoot it and explore more about it how to fix it.
-
-![][7]
-
-Other than that, it worked just fine.
-
-Do give it a try and let me know how it worked for you! If you know of something similar (and better), do let me know about it in the comments below.
-
--------------------------------------------------------------------------------
-
-via: https://itsfoss.com/gimagereader-ocr/
-
-作者：[Ankush Das][a]
-选题：[lujun9972][b]
-译者：[译者ID](https://github.com/译者ID)
-校对：[校对者ID](https://github.com/校对者ID)
-
-本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
-
-[a]: https://itsfoss.com/author/ankush/
-[b]: https://github.com/lujun9972
-[1]: https://github.com/manisandro/gImageReader
-[2]: https://tesseract-ocr.github.io/
-[3]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2021/03/gImageReader.png?resize=800%2C456&ssl=1
-[4]: https://en.wikipedia.org/wiki/HOCR
-[5]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2021/03/tesseract-language-pack.jpg?resize=800%2C620&ssl=1
-[6]: https://itsfoss.com/aur-arch-linux/
-[7]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2021/03/gImageReader-1.jpg?resize=800%2C460&ssl=1
--- a/translated/tech/20210309
+++ b/translated/tech/20210309
@ -0,0 +1,105 @@
+[#]: subject: (Use gImageReader to Extract Text From Images and PDFs on Linux)
+[#]: via: (https://itsfoss.com/gimagereader-ocr/)
+[#]: author: (Ankush Das https://itsfoss.com/author/ankush/)
+[#]: collector: (lujun9972)
+[#]: translator: (geekpi)
+[#]: reviewer: ( )
+[#]: publisher: ( )
+[#]: url: ( )
+
+在 Linux 上使用 gImageReader 从图像和 PDF 中提取文本
+======
+
+_简介：gImageReader 是一个 GUI 工具，用于在 Linux 中利用 Tesseract OCR 引擎从图像和 PDF 文件中提取文本。_
+
+[gImageReader][1] 是 [Tesseract 开源 OCR 引擎][2]的一个前端。_Tesseract_ 最初是由 HP 公司开发的，然后在 2006 年开源。
+
+基本上，OCR（光学字符识别）引擎可以让你从图片或文件（PDF）中扫描文本。默认情况下，它可以检测几种语言，还支持通过 Unicode 字符扫描。
+
+然而，Tesseract 本身是一个没有任何 GUI 的命令行工具。因此，gImageReader 就来解决这点，它可以让任何用户使用它从图像和文件中提取文本。
+
+让我重点介绍一些有关它的内容，同时说下我在测试期间的使用经验。
+
+### gImageReader：一个跨平台的 Tesseract OCR 前端
+
+![][3]
+
+为了简化事情，gImageReader 在从 PDF 文件或包含任何类型文本的图像中提取文本时非常方便。
+
+无论你是需要它来进行拼写检查还是翻译，它都应该对特定的用户群体有用。
+
+在列表总结下功能，这里是你可以用它做的事情：
+
+  * 从磁盘、扫描设备、剪贴板和截图中添加 PDF 文档和图像
+  * 能够旋转图像
+  * 常用的图像控制，用于调整亮度、对比度和分辨率。
+  * 直接通过应用扫描图像
+  * 能够一次性处理多个图像或文件
+  * 手动或自动识别区域定义
+  * 识别纯文本或 [hOCR][4] 文档
+  * 编辑器显示识别的文本
+  * 可对对提取的文本进行拼写检查
+  * 从 hOCR 文件转换/导出为 PDF 文件
+  * 将提取的文本导出为 .txt 文件
+  * 跨平台（Windows）
+
+
+
+### 在 Linux 上安装 gImageReader
+
+**注意**：_你需要安装 Tesseract 语言包，才能从软件管理器中的图像/文件中进行检测。_
+
+![][5]
+
+你可以在一些 Linux 发行版如 Fedora 和 Debian 的默认仓库中找到 gImageReader。
+
+对于 Ubuntu，你需要添加一个 PPA，然后安装它。要做到这点，下面是你需要在终端中输入的内容：
+
+```
+sudo add-apt-repository ppa:sandromani/gimagereader
+sudo apt update
+sudo apt install gimagereader
+```
+
+你也可以从 openSUSE 的构建服务中找到它，Arch Linux 用户可在 [AUR][6] 中找到。
+
+所有的仓库和包的链接都可以在他们的 [GitHub 页面][1]中找到。
+
+[gImageReader][1]
+
+### gImageReader 使用经验
+
+当你需要从图像中提取文本时，gImageReader 是一个相当有用的工具。当你尝试从 PDF 文件中提取文本时，它的效果非常好。
+
+对于从智能手机拍摄的图片中提取，检测很接近，但有点不准确。也许当你进行扫描时，从文件中识别字符可能会更好。
+
+所以，你需要亲自尝试一下，看看它是否对你而言工作良好。我在 Linux Mint 20.1（基于 Ubuntu 20.04）上试过。
+
+我只遇到了一个从设置中管理语言的问题，我没有得到一个快速的解决方案。如果你遇到此问题，那么可能需要对其进行故障排除，并进一步了解如何解决该问题。
+
+![][7]
+
+除此之外，它工作良好。
+
+试试吧，让我知道它是如何为你服务的！如果你知道类似的东西（和更好的），请在下面的评论中告诉我。
+
+--------------------------------------------------------------------------------
+
+via: https://itsfoss.com/gimagereader-ocr/
+
+作者：[Ankush Das][a]
+选题：[lujun9972][b]
+译者：[geekpi](https://github.com/geekpi)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://itsfoss.com/author/ankush/
+[b]: https://github.com/lujun9972
+[1]: https://github.com/manisandro/gImageReader
+[2]: https://tesseract-ocr.github.io/
+[3]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2021/03/gImageReader.png?resize=800%2C456&ssl=1
+[4]: https://en.wikipedia.org/wiki/HOCR
+[5]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2021/03/tesseract-language-pack.jpg?resize=800%2C620&ssl=1
+[6]: https://itsfoss.com/aur-arch-linux/
+[7]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2021/03/gImageReader-1.jpg?resize=800%2C460&ssl=1