translated

This commit is contained in:
yzuowei 2023-01-21 13:03:43 +08:00 committed by GitHub
parent e112a1dc14
commit 27837a313a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -7,127 +7,129 @@
[#]: publisher: " "
[#]: url: " "
Battle of the Texts and the Unicode Savior
文字间的战斗与其救世主 Unicode
======
We all know how to type text on the keyboard. Dont we?
我们都知道如何从键盘输入文字,不是吗?
So, may I challenge you to type that text in your favorite text editor:
那么,允许我挑战你在你最爱的文本编辑器中输入这段文字:
![«Ayumi moved to Tokyo in 1993 to pursue her career» said Dmitrii][1]
This text is challenging to type since it contains:
这段文字难以被输入因为它包含着:
- typographical signs not directly available on the keyboard,
- hiragana Japanese characters,
- the name of the Japanese capital written with a macron on top of the two letters “o” to comply with the Hepburn romanization standard,
- and finally, the first name Dmitrii written using the Cyrillic alphabet.
- 键盘上没有的印刷符号,
- 平假名日文字符,
- 为符合平文式罗马字标准,日本首都的名字中的头顶长音符号两个字母 "o"
- 以及最后,用西里尔字母拼写的名字德米特里。
No doubt, writing such a sentence on early computers would have been simply impossible. Because computers used limited character sets, unable to let coexist several writing systems. But today such limitations are lifted as we will see in this article.
毫无疑问,想要在早期的电脑中输入这样的句子是不可能的。这是因为早期电脑所使用的字符集有限,无法兼容多种书写系统。而如今类似的限制已不复存在,马上我们就能在文中看到。
### How do computers store text?
### 电脑是如何储存文字的?
Computers stores characters as numbers. And they use tables to map those numbers to the glyph used to represent them.
计算机将字符作为数字储存。它们再通过表格将这些数字与含有意义的字形一一对应。
For a long time, computers stored each character as a number between 0 and 255 (which fits exactly one byte). But that was far from being sufficient to represent the whole set of characters used in human writing. So, the trick was to use a different correspondence table depending on where in the world you lived.
在很长一段时间里,计算机将每个字符作为 0 到 255 之间的数字储存(这正好是一个字节的长度)。但这用来代表人类书写所用到的全部字符是远远不够的。而解决这个问题的诀窍在于,取决于你住在地球上的哪一块区域,系统会分别使用不同的对照表。
Here is the [ISO 8859-15][2] correspondence table commonly used in France:
这里有一张在法国常被广泛使用的对照表 [ISO 8859-15][2]
![The ISO 8859-15 encoding][3]
But if you lived in Russia, your computer would have probably used the [KOI8-R][4] or [Windows-1251][5] encoding instead. Lets assume that later was used:
如果你住在俄罗斯,你的电脑大概会使用 [KOI8-R][4] 或是 [Windows-1251][5] 来进行编码。现在让我们假设我们在使用后者:
![The Windows-1251 encoding is a popular choice to store text written using the Cyrillic alphabets][6]
For numbers lower than 128, the two tables are identical. This range is corresponding to the [US-ASCII][7] standard, some kind of minimum-compatible set between characters tables. But beyond 128, the two tables are completely different.
对于 128 之前的数字,两张表格是一样的。这个范围与 [US-ASCII][7] 相对应,这是不同字符表格之间的最低兼容性。而对于 128 之后的数字,这两张表格则完全不同了。
For example, according to Windows-1251, the string _“said Дмитрий”_ is stored as:
比如,依据 Windows-1251字符串 _“said Дмитрий”_ 会被储存为:
```
115 97 105 100 32 196 236 232 242 240 232 233
```
To follow a common practice in computer sciences, those twelve numbers can be rewritten using the more compact hexadecimal notation:
按照计算机科学的常规方法,这十二个数字可被写成更加紧凑的十六进制:
```
73 61 69 64 20 c4 ec e8 f2 f0 e8 e9
```
If Dmitrii sends me that file, and I open it I might end up seeing that:
如果德米特里发给我这份文件,我在打开后可能会看到:
```
said Äìèòðèé
```
The file _appears_ to be corrupted. But it isnt. The data— that is the _numbers_stored in that file dont have changed. As I live in France, my computer has _assumed_ the file to be encoded as ISO8859-15. And it displayed the characters _of that table_ corresponding to the data. And not the character of the encoding table used when the text was originally written.
这份文件_看起来_被损坏了实则不然。这些储存在文件里的数据即数字并没有发生改变。被显示出的字符与_另一张表格_中的数据相对应而非文字最初被写出来时所用的编码表。
To give you an example, take the character Д. It has the numeric code 196 (c4) according to Windows-1251. The only thing stored in the file is the number 196. But that same number corresponds to Ä according to ISO8859-15. So my computer wrongly believed it was the glyph intended to be displayed.
让我们来举一个例子,就以字符 Д 为例。按照 Windows-1251Д 的数字编码为 196 (c4)。储存在文件里的只有数字 196。而正是这同样的数字在 ISO8859-15 中与 Ä 相对应。这就是为什么我的电脑错误地认为字形 Ä 就是应该被显示的字形。
![When the same text file is written then read again but using a different encoding][8]
As a side note, you can still occasionally see an illustration of those issues on ill-configured websites or in email send by [mail user agents][9] making false assumptions about the character encoding used on the recipients computer. Such glitches are sometimes nicknamed [mojibake][10]. Hopefully, this is less and less frequent today.
多提一句,你依然可以时不时地看到一些错误配置的网站展示,或由[用户邮箱代理][9]发出的对收件人电脑所使用的字符编码做出错误假设的邮件。这样的故障有时被称为乱码LCTT译注原文用词为 [mojibake][10] 源自日语 _文字化け_)。好在这种情况在今天已经越来越少见了。
![Example of Mojibake on the website of a French movie distributor. The website name has been changed to preserve the innocent.][11]
### Unicode comes to save to the day
### Unicode 拯救了世界
I explained encoding issues when exchanging files between different countries. But things were even worst since the encodings used by different manufacturers for the same country were not always the same. You can understand what I mean if you had to exchange files between Mac and PC in the 80s.
我解释了不同国家间交换文件时会遇到的编码问题。但事情还能更糟,同一个国家的不同生产商未必会使用相同的编码。如果你在 80 年代用 Mac 和 PC 互传过文件你就懂我是什么意思了。
Is it a coincidence or not, the [Unicode][12] project started in 1987, led by people of Xerox and … Apple.
也不知道是不是巧合,[Unicode][12] 项目始于 1987 年,主导者来自<ruby>施乐<rt>Xerox</rt></ruby>和……<ruby>苹果<rt>Apple</rt></ruby>
The goal of the project was to define a universal character set allowing to _simultaneously_ use any character used in human writing within the same text. The original Unicode project was limited to 65536 different characters (each character being represented using 16 bits— that is two bytes per character). A number that has proven to be insufficient.
这个项目的目标是定义一套通用字符集来允许同一段文字中_同时_出现人类书写会用到的任何文字。最初的 Unicode 项目被限制在 65536 个不同字符(每个字符用 16 位表示,即每个字符两字节)。这个数字已被证实是远远不够的。
So, in 1996 Unicode has been extended to support up to 1 million different [code points][13]. Roughly speaking, a “code point” a number that identifies an entry in the Unicode character table. And one core job of the Unicode project is to make an inventory of all letters, symbols, punctuation marks and other characters that are (or were) used worldwide, and to assign to each of them a code point that will uniquely identify that character.
于是,在 1996 年 Unicode 被扩展以支持高达 100 万不同的[代码点][13]。粗略来说一个“代码点”可被用来识别字符表中的一个条目。Unicode 项目的一个核心工作就是将世界上正在被使用(或曾被使用)的字母,符号,标点符号以及其他文字仓管起来,并给每一项条目分配一个代码点用以准确分辨对应的字符。
This is a huge project: to give you some idea, the version 10 of Unicode, published in 2017, defines over 136,000 characters covering 139 modern and historic scripts.
这是一个庞大的项目:让你有个大致了解,发布于 2017 年的 Unicode 版本 10 定义了超过 136,000 个字符覆盖了 139 种现代和历史上的语言文字。
With such a large number of possibilities, a basic encoding would require 32 bits (that is 4 bytes) per character. But for text using mainly the characters in the US-ASCII range, 4 bytes per character means 4 times more storage required to save the data and 4 times more bandwidth to transmit them.
随着如此庞大数量的可能性,一个基本的编码会需要每个字符 32 位(即 4 字节)。但对于主要使用 US-ASCII 范围内字符的文字,每个字符 4 字节意味着 4 倍多的储存需求以及 4 倍多的带宽用以传输这些文字。
![Encoding text as UTF-32 requires 4 bytes per character][14]
So besides the [UTF-32][15] encoding, the Unicode consortium defined the more space-efficient [UTF-16][16] and [UTF-8][17] encodings, using respectively 16 and 8 bits. But how to store over 100,000 different values in only 8 bits? Well, you cant. But the trick is to use one code value (8 bits in UTF-8, 16 in UTF-16) to store the most frequently used characters. And to use several code values for the least commonly used characters. So UTF-8 and UTF-16 are _variable length_ encoding. Even if this has drawbacks, UTF-8 is a good compromise between space and time efficiency. Not mentioning being backward compatible with most 1-byte pre-Unicode encoding, since UTF-8 was specifically designed so any valid US-ASCII file is also a valid UTF-8 file. In a sense, UTF-8 is a superset of US-ASCII. And today, there is no reason for not using the UTF-8 encoding. Unless of course if you write mostly with languages requiring multi-byte encodings or if you have to deal with legacy systems.
所以除了 [UTF-32][15]Unicode 联盟还定义了更加节约空间的 [UTF-16][16] 和 [UTF-8][17] 编码,分别使用了 16 位和 8 位。但只有 8 位该如何储存超过 100,000 个不同的值呢事实是你不能。但这其中窍门在于用一个代码值UTF-8 中的 8 位数以及 UTF-16 中的 16 位数)来储存最常用的一些字符。再用几个代码值储存最不常用的一些字符。所以说 UTF-8 和 UTF-16 是_可变长度_编码。尽管这样也有缺陷UTF-8 是空间与时间效率之间一个不赖的妥协。更不用提 UTF-8 可以向后兼容大部分 Unicode 之前的 1 字节编码,因为 UTF-8 被特别设计成任何有效的 US-ASCII 文件就是有效的 UTF-8 文件。你也可以说UTF-8 是 US-ASCII 的超集。而在今天已经找不到不用 UTF-8 编码的理由了。当然除非你书写主要用的语言需要多字节编码,或是你不得不与一些保留系统打交道。
I let you compare the UTF-16 and UTF-8 encoding of the same string on the illustrations below. Pay special attention to the UTF-8 encoding using one byte to store the characters of the Latin alphabet. But using two bytes to store characters of the Cyrillic alphabet. That is twice more space than when storing the same characters using the Windows-1251 Cyrillic encoding.
我让你来亲自比较一下同一字符串在下面两张图案中分别使用 UTF-16 和 UTF-8 编码。特别注意 UTF-8 使用了一字节来储存拉丁字母表中的字符,但它使用了两字节来存储西里尔字母表中的字符。这是 Windows-1251 西里尔编码储存同样字符所需空间的两倍。
![UTF-16 is a variable length encoding requiring 2 bytes to encode most characters. Some character still requires 4 bytes though (for example][18]
![UTF-8 is a variable length encoding requiring 1, 2, 3 or 4 bytes per character][19]
### And how does that help for typing text?
### 而这些对于打字有什么用呢?
Well… It doesnt hurt to have some knowledge of the underlying mechanism to understand the capabilities and limitations of your computer. Especially we will talk about Unicode and hexadecimal a little later. But for now… a little bit more history. Just a little bit, I promise
啊……知道一些你的电脑的能力与局限以及其底层机制也无伤大雅嘛。特别是我们马上就要说到 Unicode 和十六进制。现在嘛……我们再在聊点历史。真的就一点,我保证…
just enough to say starting in the 80s, computer keyboard used to have a [compose key][20] (sometimes labeled the “multi” key) next to the shift key. By pressing that key, you entered in “compose” mode. And once in that mode, you were able to enter characters not directly available on your keyboard by entering mnemonics instead. For example, in compose mode, typing RO produced the ® character (which is easy to remember as an R inside an O).
…就说从 80 年代起,电脑键盘曾经有过 [compose 键][20](有时候也被标为 "multi" 键)就在 shift 键的下边。当按下这个键时,你会进入“<ruby>组合<rt>compose</rt></ruby>”模式。一旦在这个模式下,你便可以通过输入助记符来输入你键盘上没有的字符。比如说,在组合模式下,输入 RO 便可生成字符 ®(当作是 O 里面有一个 R 就能很容易记住)。
![compose key on lk201 keyboard][21]
It is now a rarity to see the compose key on modern keyboards. Probably because of the domination of PCs that dont make use of it. But on Linux (and possibly on other systems?) you can emulate the compose key. This is something that can be configured in the GUI on many desktop environments using the “keyboard” control panel: But the exact procedure varies depending on your desktop environment or even depending its version. If you changed that setting, dont hesitate to use the comment section to share the specific steps youve followed on your computer.
现在很难在现代键盘上看到 compose 键了。这大概是因为占据主导地位的 PC 不再用它了。但是在 Linux 上(可能还有其他系统)你可以模拟 compose 键。这项设置可以通过 GUI 开启,在大多数桌面环境下调用“键盘”控制面板:但具体的步骤取决于你的桌面环境以及版本。如果你成功启用了那项设置,不要犹豫在评论区分享你在你电脑上所采取的具体步骤。
As for myself, for now, I will assume you use the default Shift+AltGr combination to emulate the compose key.
LCTT 译注:如果有读者想要尝试,建议将 compose 键设为大写锁定键或是别的不常用的键Ctrl 和 Alt 会被大部分 GUI 程序优先识别为功能键。还有一些我自己试验时遇到过的问题,在开启 compose 键前要确认大写锁定是关闭的,输入法要切换成英文,组合模式下输入大小写敏感。我试验的系统是 Ubuntu 22.04 LTS。
So, as a practical example, to enter the LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, you can type Shift+AltGr<< (you dont have to maintain Shift+AltGr pressed when entering the mnemonic). If you managed to do that, I think you should be able to guess by yourself how to enter the _RIGHT-POINTING_ DOUBLE ANGLE QUOTATION MARK.
至于我自己嘛,我现在先假设你用的就是默认的 Shift+AltGr 组合来模拟 compose 键。
As another example, try Shift+AltGr--- to produce an EM DASH. For that to work, you have to press the [hyphen-minus][22] key on the main keyboard, not the one you will find on your numeric keypad.
那么,作为一个实际例子,尝试输入 <ruby>LEFT-POINTING DOUBLE ANGLE QUOTATION MARK<rt>指左双角引号</rt></ruby>LCTT译注Guillemet是法语和一些欧洲语言中的引号与中文的书名号不同你可以输入 Shift+AltGr<<(你在敲助记符时不需要一直按着 Shift+AltGr。如果你成功输入了这个符号你自己应该也能猜到要怎么输入 _<ruby>RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK<rt>指右双角引号</rt></ruby>_ 了。
Worth mentioning the “compose” key works in a non-GUI environment too. But depending if you use you use X11 or a text-only console, the supported compose key sequence are not the same.
来看看另一个例子,试试 Shift+AltGr--- 来生成一个 <ruby>EM DASH<rt>长破折号</rt></ruby>LLCT 译注:中文输入法的长破折号由两个 EM DASH 组成)。要做到这个,你需要按下主键盘上的的[连字符减号][22]键而非数字键盘上的那个。
On the console, you can check the list of supported compose key by using the `dumpkeys` command:
值得注意的是 "compose" 键在非 GUI 环境下也能工作。但是取决于你使用的是 X11 控制台还是只显示文字的控制台,它们所支持的 compose 按键顺序并不相同。
在控制台上,你可以通过命令 `dumpkeys` 来查看支持的 compose 按键列表LCTT 译注:可能需要 root 权限):
```
dumpkeys --compose-only
```
On the GUI, compose key is implemented at Gtk/X11 level. For a list of all mnemonics supported by the Gtk, take a look at that page: [https://help.ubuntu.com/community/GtkComposeTable][23]
在 GUI 下compose 键是在 Gtk/X11 层被实现的。想要知道 Gtk 所支持的助记符,可以查看页面:[https://help.ubuntu.com/community/GtkComposeTable][23]
### Is there a way to avoid relying on Gtk for character composition?
### 我们可以避免对 Gtk 字符组合的依赖吗?
Maybe Im a purist, but I found somewhat unfortunate the compose key support being hard-coded in Gtk. After all, not all GUI applications are using that library. And I cannot add my own mnemonics without re-compiling the Gtk.
或许我是个纯粹主义者,但是我为 Gtk 这种对 compose 键进行硬编码的方式感到悲哀。毕竟,不是所有 GUI 应用都会使用 Gtk 库。而且我如果想要添加我自己的助记符的话就只能重新编译 Gtk 了。
Hopefully, there is support for character composition at X11-level too. Formerly, through the venerable [X Input Method (XIM)][24].
幸好在 X11 层也有对字符组合的支持。在以前则是通过令人尊敬的 [X 输入法 (XIM)][24]。
This will work at lower-level than Gtk-based character composition. But will allow a great amount of flexibility. And will work with many X11 applications.
这个方法在比起基于 Gtk 的字符组合能够在更加底层的地方工作,同时具备优秀的灵活性并兼容很多 X11 应用。
For example, lets imagine I just want to add the --> composition to enter the → character (U+2192 RIGHTWARDS ARROW), I would create a `~/.XCompose` file containing those lines:
比如说,假设我只是想要添加 --> 组合来输入字符 → (U+2192 <ruby>RIGHTWARDS ARROW<rt>朝右箭头</rt></ruby>),我只需要新建 `~/.XCompose` 文件并写入以下代码:
```
cat > ~/.XCompose << EOT
@ -139,78 +141,78 @@ include "%L"
EOT
```
Then you can test by starting a new X11 application, forcing libraries to use XIM as input method:
然后你就可以启动一个新的 X11 应用,强制函数库使用 XIM 作为输入法,并开始测试:
```
GTK_IM_MODULE="xim" QT_IM_MODULE="xim" xterm
```
The new compose sequence should be available in the application you launched. I encourage you to learn more about the compose file format by typing `man 5 compose`.
新的组合排序应该可以在你刚启动的应用里被输入了。我鼓励你通过 `man 5 compose` 来进一步学习组合文件格式。
To make XIM the default input method for all your applications, just add to your `~/.profile` file the following two lines. that change will be effective the next time youll open a session on your computer:
在你的 `~/.profile` 中加入以下两行来将 XIM 设为你所有应用的默认输入法。这些改动会在下一次你登陆电脑时生效:
```
export GTK_IM_MODULE="xim"
export QT_IM_MODULE="xim"
```
Its pretty cool, isnt it? That way you can add all the compose sequences you might want. And there are already a couple of funny ones in the default XIM settings. Try for example to press composeLLAP.
这挺酷的,不是吗?这样你就可以随意的加入你想要的组合排序。而且在默认的 XIM 设置中已经有几个有意思的了。试一下输入 composeLLAP。
Well, I must mention two drawbacks though. XIM is relatively old and is probably only suitable for those of us who dont regularly need multi-bytes input methods. Second, when using XIM as your input method, you no longer can enter Unicode characters by their code point using the Ctrl+Shift+u sequence. What? Wait a minute? I didnt talk about that yet? So lets do it now:
但我不得不提到两个缺陷。XIM 已经比较老了而且只适合我们这些不太需要多字节输入法的人。其次,当你用 XIM 作为输入法的时候,你就不能利用 Ctrl+Shift+u 加上代码点来输入 Unicode 字符了。什么?等一下?我还没聊过那个?让我们现在来聊一下吧:
### What if there is no compose key sequence for the character I need?
### 如果我需要的字符没有对应的 compose 键排序该怎么办?
The compose key is a nice tool to type some characters not available on the keyboard. But the default set of combinations is limited, and switching to XIM and defining a new compose sequence for a character you will need only once in a lifetime can be cumbersome.
Compose 键是一个不错的工具,它可以用来输入一些键盘上没有的字符。但默认的组合集有限,而切换 XIM 并为一个你一生仅用一次的字符来定义一个新的组合排序十分麻烦。
Does that prevent you to mix Japanese, Latin and Cyrillic characters in the same text? Certainly not, thanks to Unicode. For example, the name あゆみ is made of:
但这能阻止你在同一段文字里混用日语,拉丁语,还有西里尔字符吗?显然不能,这多亏了 Unicode。比如说名字 あゆみ 由三个字母组成:
- the [HIRAGANA LETTER A (U+3042)][25]
- the [HIRAGANA LETTER YU (U+3086)][26]
- and the [HIRAGANA LETTER MI (U+307F)][27]
- [<ruby>HIRAGANA LETTER A<rt>平假名字母 あ</rt></ruby> (U+3042)][25]
- [<ruby>HIRAGANA LETTER YU<rt>平假名字母 ゆ</rt></ruby> (U+3086)][26]
- 以及 [<ruby>HIRAGANA LETTER MI<rt>平假名字母 み</rt></ruby> (U+307F)][27]
I mentioned above the official Unicode character names, following the convention to write them in all upper cases. After their name, you will find their Unicode code point, written between parenthesis, as a 16-bit hexadecimal number. Does that remind you something?
我在上文提及了 Unicode 字符的正式名称,并遵循了全部用大写拼写的规范。在它们的名字后面,你可以找到它们的 Unicode 代码点,位于括号之间并写作 16 位的十六进制数字。这让你想到什么了吗?
Anyway, once you know the code point of a character, you can enter it using the following combination:
不管怎样,一旦你知道了的一个字符的代码点,你就可以按照以下组合输入:
- Ctrl+Shift+u, then XXXX (the _hexadecimal_ code point of the character you want) and finally Enter.
- Ctrl+Shift+u,然后 XXXX你想要的字符的_十六进制_代码点然后回车。
As a shorthand, if you dont release Ctrl+Shift while entering the code point, you wont have to press Enter.
作为一种简写方式,如果你在输入代码点时不松开 Ctrl+Shift你就不用敲回车。
Unfortunately, that feature is implemented at software library level rather than at X11 level. So the support may be variable among different applications. In LibreOffice, for example, you have to type the code point using the main keyboard. Whereas Gtk-based application will accept entry from the numeric keypad as well.
不幸的是,这项功能的实现是在软件库层而非 X11 层,所以对其支持在不同应用间并不统一。以 LibreOffice 为例,你必须使用主键盘来输入代码点。而在基于 Gtk 的应用则接受来自数字键盘的输入。
Finally, when working at the console on my Debian system, there is a similar feature, but requiring instead to press Alt+XXXXX where XXXXX is the code point of the character you want, but written in _decimal_ this time. I wonder if this is Debian-specific or related to the fact Im using the en_US.UTF-8 locale. If you have more information about that, I would be curious to read you in the comment section!
最后,当我跟在我的 Debian 系统上的控制台打交道时,我发现了一个类似的功能,但它需要你按下 Alt+XXXXX 而 XXXXX 是你想要的字符的代码点写作_十进制_。我很好奇这究竟是 Debian 独有的功能还是因为我使用的 locale 是 en_US.UTF-8。如果你对此有更多信息我会很愿意在评论区读到它们的
| GUI | Console | Character |
| GUI | 控制台 | 字符 |
| :- | :- | :- |
| Ctrl+Shift+u3042Enter | Alt+12354 | あ |
| Ctrl+Shift+u3086Enter | Alt+12422 | ゆ |
| Ctrl+Shift+u307FEnter | Alt+12415 | み |
### Dead keys
### 死键
Last but not least, there is a simpler method to enter key combinations that do not rely (necessarily) on the compose key.
最后值得一提的是,想要不(必须)依赖 compose 键来输入键组合还有一个更简单的方法。
Some keys on your keyboard were specifically designed to create a combination of characters. Those are called [dead keys][28]. Because when you press them once, nothing seems to happen. But they will silently modify the character produced by the next key you will press. This is a behavior inspired from mechanical typewriter: with them, pressing a dead key imprinted a character, but will not move the carriage. So the next keystroke will imprint another character at the same position. Visually resulting in a combination of the two pressed keys.
你的键盘上的某些键是专门用来创造字符组合的。这些键叫做[死键][28]。这是因为当你按下它们一次,看起来什么都没有发生。但它们会悄悄地改变你下一次按键所产生的字符。这个行为的灵感来自于机械打字机:在使用机械打字机时,按下一个死键会印下一个字符,但不会移动字盘。于是下一次按键则会在同一个地方印下另一个字符。视觉效果就是两次按键的组合。
We use that a lot in French. For example, to enter the letter “ë” I have to press the ¨ dead key followed by the e key. Similarly, Spanish people have the ~ dead key on their keyboard. And on the keyboard layout for Nordic languages, you can find the ° key. And I could continue that list for a very long time.
我们在法语里经常用到这个。举例来说,想要输入字母 “ë” 我必须按下死键 ¨ 然后再按下 e 键。同样地,西班牙人的键盘上有着死键 ~。而在北欧语系下的键盘布局,你可以找到 ° 键。我可以念很久这份清单。
![hungary dead keys][29]
Obviously, not all dead keys are available on all keyboard. I fact, most dead keys are NOT available on your keyboard. For example, I assume very few of you— if any— have a dead key ­­­¯ to enter the macron (“flat accent”) used to write Tōkyō.
显然,不是所有键盘都有所有死键。实际上,你的键盘上是找不到大部分死键的。比如说,我猜在你们当中只有小部分人——如果真的有的话——有死键 ¯ 来输入 Tōkyō 所需要的长音符号(“平变音符”)。
For those dead keys that are not directly available on your keyboard, you need to resort to other solutions. The good news is weve already used those techniques. But this time we will use them to emulate dead keys. Not “ordinary” keys.
对于那些你键盘上没有的死键,你需要寻找别的解决方案。好消息是,我们已经用过那些技术了。但这一次我们要用它们来模拟死键,而非“普通”键。
So, a first option could be to generate the macron dead key by using Compose- (the hyphen-minus key available on your keyboard). Nothing appears. But if after that you press the o key it will finally produce “ō”.
那么,我们的第一个选择是利用 Compose- 来生成长音符号(你键盘上有的连字符减号)。按下时屏幕上什么都不会出现,但当你接着按下 o 键你就能看到 “ō”。
The list of dead keys that Gtk can produce using the compose mode can be found [here][30].
Gtk 在组合模式下可以生成的一系列死键都能在[这里][30]找到。
A different solution would use the Unicode COMBINING MACRON (U+0304) character. Followed by the letter o. I will leave the details up to you. But if youre curious, you may discover this leads to a very subtlely different result, rather than really producing a LATIN SMALL LETTER O WITH MACRON. And if I wrote the end of the previous sentence in all uppercase, this is a hint guiding you toward a method to enter ō with fewer keystrokes than by using a Unicode combining character… But I let that to your sagacity.
另一个解决方法则是利用 Unicode 字符 <ruby>COMBINING MACRON<rt>组合长音符号</rt><ruby> (U+0304),然后字母 o。我把细节都留给你。但如果你好奇的话你会发现你打出的结果有着微妙的不同你并没有真地打出 <ruby>LATIN SMALL LETTER O WITH MACRON<rt>小写拉丁字母 O 带长音符号</rt></ruby>。如果我在上一句的结尾用了大写拼写,这应该可以提示你寻找通过 Unicode 组合字符按更少的键输入 ō 的方法……现在我将这些留给你的聪明才智去解决了。
### Your turn to practice!
### 轮到你来练习了!
So, did you get it all? Does that work on your computer? Its your turn to try that: using the clues given above, and a little bit of practice, now you can enter the text of the challenge given in the beginning of this article. Do it, then copy-paste your text in the comment section below as proof of your success.
所以,你都学会了吗?这些在你的电脑上工作吗?现在轮到你来尝试了:根据上面提出的线索,加上一点练习,现在你可以完成文章开头给出的挑战了。挑战一下吧,然后把成果复制到评论区作为你成功的证明。
There is nothing to win, except maybe the satisfaction of impressing your peers!
赢了也没有奖励,或许来自同伴的惊叹能够满足你!
--------------------------------------------------------------------------------
@ -218,7 +220,7 @@ via: https://itsfoss.com/unicode-linux/
作者:[Sylvain Leroux][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
译者:[yzuowei](https://github.com/yzuowei)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出