mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-10 22:21:11 +08:00
Translated
tech/20220105 Create bookmarks for your PDF with pdftk.md
This commit is contained in:
parent
1368d9bef2
commit
5ed28752d7
@ -1,175 +0,0 @@
|
||||
[#]: subject: "Create bookmarks for your PDF with pdftk"
|
||||
[#]: via: "https://opensource.com/article/22/1/pdf-metadata-pdftk"
|
||||
[#]: author: "Seth Kenlon https://opensource.com/users/seth"
|
||||
[#]: collector: "lujun9972"
|
||||
[#]: translator: "toknow-gh"
|
||||
[#]: reviewer: " "
|
||||
[#]: publisher: " "
|
||||
[#]: url: " "
|
||||
|
||||
Create bookmarks for your PDF with pdftk
|
||||
======
|
||||
Providing bookmarks to your users is helpful and takes advantage of the
|
||||
technology available.
|
||||
![Business woman on laptop sitting in front of window][1]
|
||||
|
||||
In [introducing pdftk-java][2], I explained how I use the `pdftk-java` command to make quick, often scripted, modifications to PDF files.
|
||||
|
||||
However, one of the things `pdftk-java` is most useful for is when I've downloaded a big PDF file, sometimes with hundreds of pages of reference text, and discovered that the PDF creator didn't include a table of contents. I don't mean a printed table of contents in the front matter of the book; I mean the table of contents you get down the side of your PDF reader, which the PDF format officially calls "bookmarks."
|
||||
|
||||
![Screenshot of a sidebar table of contents next to a PDF][3]
|
||||
|
||||
(Seth Kenlon, [CC BY-SA 4.0][4])
|
||||
|
||||
Without bookmarks, finding the chapter you need to reference is cumbersome and involves either lots of scrolling or frustrating searches for words you think you remember seeing in the general area.
|
||||
|
||||
Another minor annoyance of many PDF files is the lack of metadata, such as a proper title and author in the PDF properties. If you've ever opened up a PDF and seen something generic like "Microsoft Word - 04_Classics_Revisited.docx" in the window title bar, you know this issue.
|
||||
|
||||
I don't have to deal with this problem anymore because I have `pdftk-java`, which lets me create my own bookmarks.
|
||||
|
||||
### Install pdftk-java on Linux
|
||||
|
||||
As its name suggests, pdftk-java is written in Java, so it works on all major operating systems as long as you have Java installed.
|
||||
|
||||
Linux and macOS users can install Linux from [AdoptOpenJDK.net][5].
|
||||
|
||||
Windows users can install [Red Hat's Windows build of OpenJDK][6].
|
||||
|
||||
To install pdftk-java on Linux:
|
||||
|
||||
1. Download the [pdftk-all.jar release][7] from its Gitlab repository and save it to `~/.local/bin/` or [some other location in your path][8].
|
||||
2. Open `~/.bashrc` in your favorite text editor and add this line to it: `alias pdftk='java -jar $HOME/.local/bin/pdftk-all.jar'`
|
||||
3. Load your new Bash settings: `source ~/.bashrc`
|
||||
|
||||
|
||||
|
||||
### Data dump
|
||||
|
||||
The first step in correcting the metadata of a PDF is to extract the data file that the PDF currently contains.
|
||||
|
||||
There's probably not much to the data file (that's the problem!), but it gives you a good starting place.
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ pdftk mybigfile.pdf \
|
||||
data_dump \
|
||||
output bookmarks.txt
|
||||
|
||||
```
|
||||
|
||||
This produces a file called `bookmarks.txt`, and it contains all the metadata assigned to the input file (in this example, `mybigfile.pdf`), plus a lot of bloat.
|
||||
|
||||
### Editing metadata
|
||||
|
||||
To edit the metadata of the PDF, open your `bookmarks.txt` file in your favorite text editor, such as [Atom][9] or [Gedit][10].
|
||||
|
||||
The format is mostly intuitive, and the data contained within it is predictably neglected:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
InfoBegin
|
||||
InfoKey: Creator
|
||||
InfoValue: Word
|
||||
InfoBegin
|
||||
InfoKey: ModDate
|
||||
InfoValue: D:20151221203353Z00'00'
|
||||
InfoBegin
|
||||
InfoKey: CreationDate
|
||||
InfoValue: D:20151221203353Z00'00'
|
||||
InfoBegin
|
||||
InfoKey: Producer
|
||||
InfoValue: Mac OS X 10.10.4 Quartz PDFContext
|
||||
InfoBegin
|
||||
InfoKey: Title
|
||||
InfoValue: Microsoft Word - 04_UA_Classics_Revisited.docx
|
||||
PdfID0: f049e63eaf3b4061ddad16b455ca780f
|
||||
PdfID1: f049e63eaf3b4061ddad16b455ca780f
|
||||
NumberOfPages: 42
|
||||
PageMediaBegin
|
||||
PageMediaNumber: 1
|
||||
PageMediaRotation: 0
|
||||
PageMediaRect: 0 0 612 792
|
||||
PageMediaDimensions: 612 792
|
||||
[...]
|
||||
|
||||
```
|
||||
|
||||
You can edit InfoValue fields to contain data that makes sense for the PDF you're repairing. For instance, instead of setting the Creator key to the value Word, you could set it to the actual author's name or the publishing house releasing the PDF file. Rather than giving the document the default export string of the application that produced it, give it the book's actual title.
|
||||
|
||||
There's also some cleanup work you can do. Everything below the `NumberOfPages` line is also unnecessary, so remove those lines.
|
||||
|
||||
### Adding bookmarks
|
||||
|
||||
PDF bookmarks follow this format:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
BookmarkBegin
|
||||
BookmarkTitle: My first bookmark
|
||||
BookmarkLevel: 1
|
||||
BookmarkPageNumber: 2
|
||||
|
||||
```
|
||||
|
||||
* `BookmarkBegin` indicates that you're creating a new bookmark.
|
||||
* `BookmarkTitle` contains the text that's visible in the PDF viewer.
|
||||
* `BookmarkLevel` sets the inheritance level of this bookmark. If you set a BookmarkLevel to 2, it appears within a disclosure triangle of the previous bookmark. If you set a BookmarkLevel to 3, it appears within a disclosure triangle of the previous bookmark, as long as the previous bookmark is set to 2. This setting gives you the ability to bookmark, for example, a chapter title as well as section headings within that chapter.
|
||||
* `BookmarkPageNumber` determines what PDF page the user is taken to when they click the bookmark.
|
||||
|
||||
|
||||
|
||||
Create bookmarks for each section of the book that you think is important, then save the file.
|
||||
|
||||
### Updating bookmark info
|
||||
|
||||
Now that you have your metadata and bookmarks set, you can apply them to your PDF—actually, you’ll apply them to a new PDF that contains the same content as the old PDF:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ pdftk mybigfile.pdf \
|
||||
update_info bookmarks.txt \
|
||||
output mynewfile.pdf
|
||||
|
||||
```
|
||||
|
||||
This produces a file called `mynewfile.pdf`, containing all of your metadata and bookmarks.
|
||||
|
||||
### Professional publishing
|
||||
|
||||
The difference between a PDF with generic metadata and no bookmarks and a PDF with personalized metadata values and useful bookmarks probably isn't going to make or break a sale.
|
||||
|
||||
However, paying attention to the small details like metadata shows that you value quality assurance, and providing bookmarks to your users is helpful and takes advantage of the technology available.
|
||||
|
||||
Use `pdftk-java` to make this process easy, and your users will thank you.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/22/1/pdf-metadata-pdftk
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/lenovo-thinkpad-laptop-concentration-focus-windows-office.png?itok=-8E2ihcF (Woman using laptop concentrating)
|
||||
[2]: https://opensource.com/article/21/12/edit-pdf-linux-pdftk
|
||||
[3]: https://opensource.com/sites/default/files/uploads/pdtfk_update.jpeg (table of contents)
|
||||
[4]: https://creativecommons.org/licenses/by-sa/4.0/
|
||||
[5]: https://adoptopenjdk.net/releases.html
|
||||
[6]: https://developers.redhat.com/products/openjdk/download
|
||||
[7]: https://gitlab.com/pdftk-java/pdftk/-/jobs/1527259628/artifacts/raw/build/libs/pdftk-all.jar
|
||||
[8]: https://opensource.com/article/17/6/set-path-linux
|
||||
[9]: https://opensource.com/article/20/12/atom
|
||||
[10]: https://opensource.com/article/20/12/gedit
|
@ -0,0 +1,173 @@
|
||||
[#]: subject: "Create bookmarks for your PDF with pdftk"
|
||||
[#]: via: "https://opensource.com/article/22/1/pdf-metadata-pdftk"
|
||||
[#]: author: "Seth Kenlon https://opensource.com/users/seth"
|
||||
[#]: collector: "lujun9972"
|
||||
[#]: translator: "toknow-gh"
|
||||
[#]: reviewer: " "
|
||||
[#]: publisher: " "
|
||||
[#]: url: " "
|
||||
|
||||
使用 pdftk 为 PDF 文档创建书签
|
||||
======
|
||||
充分利用现有的技术,提供书签以帮助用户。
|
||||
![Business woman on laptop sitting in front of window][1]
|
||||
|
||||
在 [介绍 pdftk-java][2] 中, 我展示了如何在脚本中使用 `pdftk-java` 来快速修改 PDF 文件。
|
||||
|
||||
但是,`pdftk-java` 最有用的场景是处理那种动辄几百页的没有目录的大 PDF 文件。这里所谓的目录不是指文档前面供打印的目录,而是指显示在 PDF 阅读器侧边栏里的目录,它在 PDF 格式中的正式叫法是“书签”。
|
||||
|
||||
![Screenshot of a sidebar table of contents next to a PDF][3]
|
||||
|
||||
(Seth Kenlon, [CC BY-SA 4.0][4])
|
||||
|
||||
如果没有书签,就只能通过上下滚动或全局搜索文本来定位想要的章节,这非常麻烦。
|
||||
|
||||
PDF 文件的另一个恼人的小问题是缺乏元数据,比如标题和作者。如果你打开过一个标题栏上显示类似 “Microsoft Word - 04_Classics_Revisited.docx” 的 PDF 文件,你就能体会那种感觉了。
|
||||
|
||||
`pdftk-java` 让我能够创建自己的书签,我再也不面对这些问题了。
|
||||
|
||||
|
||||
### 在 Linux 上安装 pdftk-java
|
||||
|
||||
正如 `pdftk-java` 的名称所示的,它是用 Java 编写的。它能够在所有主流操作系统上运行,只要你安装了 Java。
|
||||
|
||||
Linux 和 macOS 用户可以从 [AdoptOpenJDK.net][5] 安装 Java(LCTT 译注:原文为 Linux,应为笔误)。
|
||||
|
||||
Windows 用户可以安装 [Red Hat's Windows build of OpenJDK][6]。
|
||||
|
||||
在 Linux 上安装 pdftk-java:
|
||||
|
||||
1. 从 Gitlab 仓库下载 [pdftk-all.jar release][7],保存至 `~/.local/bin/` 或 [其它路径][8] 下.
|
||||
2. 用文本编辑器打开 `~/.bashrc`,添加 `alias pdftk='java -jar $HOME/.local/bin/pdftk-all.jar'`
|
||||
3. 运行 `source ~/.bashrc` 使新的 Bash 设置生效。
|
||||
|
||||
|
||||
|
||||
### 数据转储
|
||||
|
||||
修改元数据的第一步是抽取 PDF 当前的数据文件。
|
||||
|
||||
现在的数据文件可能并没包含多少内容,但这也是一个不错的开端。
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ pdftk mybigfile.pdf \
|
||||
data_dump \
|
||||
output bookmarks.txt
|
||||
|
||||
```
|
||||
生成的 `bookmarks.txt` 文件中包含了输入 PDF 文件 `mybigfile.pdf` 的所有元数据和一大堆无用数据。
|
||||
|
||||
|
||||
### 编辑元数据
|
||||
|
||||
用文本编辑器(比如 [Atom][9] 或 [Gedit][10])打开 `bookmarks.txt` 以编辑 PDF 元数据。
|
||||
|
||||
元数据的格式和数据项直观易懂:
|
||||
|
||||
```
|
||||
|
||||
|
||||
InfoBegin
|
||||
InfoKey: Creator
|
||||
InfoValue: Word
|
||||
InfoBegin
|
||||
InfoKey: ModDate
|
||||
InfoValue: D:20151221203353Z00'00'
|
||||
InfoBegin
|
||||
InfoKey: CreationDate
|
||||
InfoValue: D:20151221203353Z00'00'
|
||||
InfoBegin
|
||||
InfoKey: Producer
|
||||
InfoValue: Mac OS X 10.10.4 Quartz PDFContext
|
||||
InfoBegin
|
||||
InfoKey: Title
|
||||
InfoValue: Microsoft Word - 04_UA_Classics_Revisited.docx
|
||||
PdfID0: f049e63eaf3b4061ddad16b455ca780f
|
||||
PdfID1: f049e63eaf3b4061ddad16b455ca780f
|
||||
NumberOfPages: 42
|
||||
PageMediaBegin
|
||||
PageMediaNumber: 1
|
||||
PageMediaRotation: 0
|
||||
PageMediaRect: 0 0 612 792
|
||||
PageMediaDimensions: 612 792
|
||||
[...]
|
||||
|
||||
```
|
||||
|
||||
你可以将 `InfoValue` 的值修改为对当前 PDF 有意义的内容。比如可以将 Creator 字段从 Word 修改为实际的作者或出版社名称。比起使用导出程序自动生成的标题,使用书籍的实际标题会更好。
|
||||
|
||||
你也可以做一些清理工作。在 `NumberOfPages` 之后的行都不是必需的,可以删除这些行的内容。
|
||||
|
||||
### 添加书签
|
||||
|
||||
PDF 书签的格式如下:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
BookmarkBegin
|
||||
BookmarkTitle: My first bookmark
|
||||
BookmarkLevel: 1
|
||||
BookmarkPageNumber: 2
|
||||
|
||||
```
|
||||
|
||||
* `BookmarkBegin` 表示这是一个书签。
|
||||
* `BookmarkTitle` 书签在 PDF 阅读器中显示的文本。
|
||||
* `BookmarkLevel` 书签层级。如果书签层级为 2,它将出现在上一个书签的小三角下。如果设置为 3,它会显示在上一个 2 级书签的小三角下。这让你能为章以及其中的节设置书签。
|
||||
* `BookmarkPageNumber` 点击书签时转到的页码。
|
||||
|
||||
|
||||
|
||||
为你需要的章节创建书签,然后保存文件。
|
||||
|
||||
### 更新书签信息
|
||||
|
||||
现在已经准备好了元数据和书签,你可以将它们导入到 PDF 文件中。实际上是将这些信息导入到一个新的 PDF 文件中,它的内容与原 PDF 文件相同:
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ pdftk mybigfile.pdf \
|
||||
update_info bookmarks.txt \
|
||||
output mynewfile.pdf
|
||||
|
||||
```
|
||||
|
||||
生成的 `mynewfile.pdf` 包含了你设置的全部元数据和书签。
|
||||
|
||||
### 体现专业性
|
||||
|
||||
PDF 文件中是否包含定制化的元数据和书签可能并不会影响销售。
|
||||
|
||||
但是,关注元数据可以向用户表明你重视质量保证。增加书签可以为用户提供便利,同时亦是充分利用现有技术。
|
||||
|
||||
使用 `pdftk-java` 来简化这个过程,用户会感激不尽。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/22/1/pdf-metadata-pdftk
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[toknow-gh](https://github.com/toknow-gh)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/lenovo-thinkpad-laptop-concentration-focus-windows-office.png?itok=-8E2ihcF (Woman using laptop concentrating)
|
||||
[2]: https://opensource.com/article/21/12/edit-pdf-linux-pdftk
|
||||
[3]: https://opensource.com/sites/default/files/uploads/pdtfk_update.jpeg (table of contents)
|
||||
[4]: https://creativecommons.org/licenses/by-sa/4.0/
|
||||
[5]: https://adoptopenjdk.net/releases.html
|
||||
[6]: https://developers.redhat.com/products/openjdk/download
|
||||
[7]: https://gitlab.com/pdftk-java/pdftk/-/jobs/1527259628/artifacts/raw/build/libs/pdftk-all.jar
|
||||
[8]: https://opensource.com/article/17/6/set-path-linux
|
||||
[9]: https://opensource.com/article/20/12/atom
|
||||
[10]: https://opensource.com/article/20/12/gedit
|
Loading…
Reference in New Issue
Block a user