From b7ce018c3a2615ce9ae9f819fdd5f2a9e8379dcf Mon Sep 17 00:00:00 2001
From: lkxed <lkxed@outlook.com>
Date: Tue, 24 May 2022 23:50:10 +0800
Subject: [PATCH] =?UTF-8?q?[=E6=89=8B=E5=8A=A8=E9=80=89=E9=A2=98][tech]:?=
 =?UTF-8?q?=2020220524=20pdfgrep-=20Use=20Grep=20Like=20Search=20on=20PDF?=
 =?UTF-8?q?=20Files=20in=20Linux=20Command=20Line.md?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 ...arch on PDF Files in Linux Command Line.md | 234 ++++++++++++++++++
 1 file changed, 234 insertions(+)
 create mode 100644 sources/tech/20220524 pdfgrep- Use Grep Like Search on PDF Files in Linux Command Line.md

diff --git a/sources/tech/20220524 pdfgrep- Use Grep Like Search on PDF Files in Linux Command Line.md b/sources/tech/20220524 pdfgrep- Use Grep Like Search on PDF Files in Linux Command Line.md
new file mode 100644
index 0000000000..1dcbd177c6
--- /dev/null
+++ b/sources/tech/20220524 pdfgrep- Use Grep Like Search on PDF Files in Linux Command Line.md	
@@ -0,0 +1,234 @@
+[#]: subject: "pdfgrep: Use Grep Like Search on PDF Files in Linux Command Line"
+[#]: via: "https://itsfoss.com/pdfgrep/"
+[#]: author: "Pratham Patel https://itsfoss.com/author/pratham/"
+[#]: collector: "lkxed"
+[#]: translator: " "
+[#]: reviewer: " "
+[#]: publisher: " "
+[#]: url: " "
+
+pdfgrep: Use Grep Like Search on PDF Files in Linux Command Line
+======
+
+Even if you use the Linux command line moderately, you must have come across the [grep command][1].
+
+Grep is used to search for a pattern in a text file. It can do crazy powerful things, like search for new lines, search for lines where there are no uppercase characters, search for lines where the initial character is a number, and much, much more. Check out some [common grep command examples][2] if you are interested.
+
+But grep works only on plain text files. It won’t work on PDF files because they are binary files.
+
+This is where pdfgrep comes into the picture. It works like grep for PDF files. Let us have a look at that.
+
+### Meet pdfgrep: grep like regex search for PDF files
+
+[pdfgrep][3] tries to be compatible with GNU Grep, where it makes sense. Several of your favorite grep options are supported (such as -r, -i, -n or -c). You can use to search for text inside the contents of PDF files.
+
+Though it doesn’t come pre-installed like grep, it is available in the repositories of most Linux distributions.
+
+You can use your distribution’s [package manager][4] to install this awesome tool.
+
+For users of Ubuntu and Debian-based distributions, use the apt command:
+
+```
+sudo apt install pdfgrep
+```
+
+For Red Hat and Fedora, you can use the dnf command:
+
+```
+sudo dnf install pdfgrep
+```
+
+Btw, do you run Arch? You can [use the pacman command][5]:
+
+```
+sudo pacman -S pdfgrep
+```
+
+### Using pdfgrep command
+
+Now that pdfgrep is installed let me show you how to use it in most common scenarios.
+
+If you have any experience with grep, then most of the options will feel familiar to you.
+
+To demonstrate, I will be using [The Linux Command Line][6] PDF book, written by William Shotts. It’s one of the [few Linux books that are legally available for free][7].
+
+The syntax for pdfgrep is as follows:
+
+```
+pdfgrep [PATTERN] [FILE.pdf]
+```
+
+#### Normal search
+
+Let’s try doing a basic search for the text ‘xdg’ in the PDF file.
+
+```
+pdfgrep xdg TLCL-19.01.pdf
+```
+
+![simple search using pdfgrep][8]
+
+This resulted in only one match… But a match nonetheless!
+
+#### Case insensitive search
+
+Most of the time, the term ‘xdg’ is used with capitalized alphabetical characters. So, let’s try doing a case-insensitive search. For a case insensitive search, I will use the –ignore-case option.
+
+You can also use the shorter alternative, which is -i.
+
+```
+pdfgrep --ignore-case xdg TLCL-19.01.pdf
+```
+
+![case insensitive search using pdfgrep][9]
+
+As you can see, I got more matches after turning on case insensitive searching.
+
+#### Get a count of all matches
+
+Sometimes, the user wants to know how many matches were found of the word. Let’s see how many times the word ‘Linux’ is mentioned (with case insensitive matching).
+
+The option to use in this scenario is –count (or -c for short).
+
+```
+pdfgrep --ignore-case linux TLCL-19.01.pdf --count
+```
+
+![getting a count of matches using pdfgrep][10]
+
+Woah! Linux was mentioned 1200 times in this book… That was unexpected.
+
+#### Show page number
+
+Regular text files are giant monolithic files. There are no pages. But a PDF file has pages. So, you can see where the pattern was found and on which page. Use the –page-number option to show the page number where the pattern was matched. You can also use the `-n` option as a shorter alternative.
+
+Let us see how it works with an example. I want to see the pages where the word ‘awk’ matches. I added a space at the end of the pattern to prevent matching with words like ‘awkward’, getting unintentional matches would be *awkward*. Instead of escaping space with a backslash, you can also enclose it in single quotes ‘awk ‘.
+
+```
+pdfgrep --page-number --ignore-case awk\  TLCL-19.01.pdf
+```
+
+![show which pattern was found on which page using pdfgrep][11]
+
+The word ‘awk’ was found twice on page number 333, once on page 515 and once again on page 543 in the PDF file.
+
+#### Show match count per page
+
+Do you want to know how many matches were found on which page instead of showing the matches themselves? If you said yes, well it is your lucky day!
+
+Using the –page-count option does exactly that. As a shorter alternative, you use the -p option. When you provide this option to pdfgrep, it is assumed that you requested `-n` as well.
+
+Let’s take a look at how the output looks. For this example, I will see where the [ln command][12] is used in the book.
+
+```
+pdfgrep --page-count ln\  TLCL-19.01.pdf
+```
+
+![show which page has how many matches using pdfgrep][13]
+
+The output is in the form of ‘page number: matches’. This means, on page number 4, the command (or rather “pattern”) was found only once. But on page number 57, pdfgrep found 4 matches.
+
+#### Get some context
+
+When the number of matches found is quite big, it is nice to have some context. For that, pdfgrep provides some options.
+
+* –after-context NUM: Print NUM of lines that come after the matching lines (or use `-A`)
+* –before-context NUM: Print NUM of lines that are before the matching lines (or use `-B`)
+* –context NUM: Print NUM of lines that are before and come after the matching lines (or use `-C`)
+
+Let’s find ‘XDG’ in the PDF file, but this time, with a little more context ( ͡❛ ͜ʖ ͡❛)
+
+**Context after matches**
+
+Using the –after-context option along with a number, I can see which lines come after the line(s) that match. Below is an example of how it looks.
+
+```
+pdfgrep --after-context 2 XDG TLCL-19.01.pdf
+```
+
+![using '--after-context' option in pdfgrep][14]
+
+**Context before matches**
+
+Same thing can be done for scenarios when you need to know what lines are present before the line that matches. In that case, use the –before-context option, along with a number. Below is an example demonstrating usage of this option.
+
+```
+pdfgrep --before-context 2 XDG TLCL-19.01.pdf
+```
+
+![using '--before-context' option in pdfgrep][15]
+
+**Context around matches**
+
+If you want to see which lines are present before and come after the line that matched, use the –context option and also provide a number. Below is an example.
+
+```
+pdfgrep --context 2 XDG TLCL-19.01.pdf
+```
+
+![using '--context' option in pdfgrep][16]
+
+#### Caching
+
+A PDF file consists of images as well as text. When you have a large PDF file, it might take some time to skip other media, extract text and then “grep” it. Doing it often and waiting every time can get frustrating.
+
+For that reason, the –cache option exists. It caches the rendered text to speed up grep-ing. This is especially noticeable on large files.
+
+```
+pdfgrep --cache --ignore-case grep TLCL-19.01.pdf
+```
+
+![getting faster results using the '--cache' option][17]
+
+While not the be-all and end-all, I carried out a search 4 times. Twice with cache enable and twice without cache enable. To show the speed difference, I used the time command. Look closely at the time indicated by ‘real’ value.
+
+As you can see, the commands that include –cache option were completed faster than the ones that didn’t include it.
+
+Additionally, I suppressed the output using the –quiet option for faster completion.
+
+#### Password protected PDF files
+
+Yes, pdfgrep supports grep-ing even password-protected files. All you have to do is use the –password option, followed by the password.
+
+I do not have a password-protected file to demonstrate with, but you can use this option in the following manner:
+
+```
+pdfgrep --password [PASSWORD] [PATTERN] [FILE.pdf]
+```
+
+### Conclusion
+
+pdfgrep is a very handy tool if you are dealing with PDF files and want the functionality of ‘grep’, but for PDF files. A reason why I like pdfgrep is that it tries to be compatible with GNU Grep.
+
+Give it a try and let me know what you think of pdfgrep.
+
+--------------------------------------------------------------------------------
+
+via: https://itsfoss.com/pdfgrep/
+
+作者：[Pratham Patel][a]
+选题：[lkxed][b]
+译者：[译者ID](https://github.com/译者ID)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://itsfoss.com/author/pratham/
+[b]: https://github.com/lkxed
+[1]: https://linuxhandbook.com/what-is-grep/
+[2]: https://linuxhandbook.com/grep-command-examples/
+[3]: https://pdfgrep.org/
+[4]: https://itsfoss.com/package-manager/
+[5]: https://itsfoss.com/pacman-command/
+[6]: https://www.linuxcommand.org/tlcl.php
+[7]: https://itsfoss.com/learn-linux-for-free/
+[8]: https://itsfoss.com/wp-content/uploads/2022/05/01_pdfgrep_normal_search-1-800x308.webp
+[9]: https://itsfoss.com/wp-content/uploads/2022/05/02_pdfgrep_case_insensitive-800x413.webp
+[10]: https://itsfoss.com/wp-content/uploads/2022/05/03_pdfgrep_count-800x353.webp
+[11]: https://itsfoss.com/wp-content/uploads/2022/05/04_pdfgrep_page_number-800x346.webp
+[12]: https://linuxhandbook.com/ln-command/
+[13]: https://itsfoss.com/wp-content/uploads/2022/05/05_pdfgrep_pg_count-800x280.webp
+[14]: https://itsfoss.com/wp-content/uploads/2022/05/06_pdfgrep_after_context-800x340.webp
+[15]: https://itsfoss.com/wp-content/uploads/2022/05/07_pdfgrep_before_context-800x356.webp
+[16]: https://itsfoss.com/wp-content/uploads/2022/05/08_pdfgrep_context-800x453.webp
+[17]: https://itsfoss.com/wp-content/uploads/2022/05/09_pdfgrep_cache-800x575.webp