申请翻译A gawk script to convert smart quotes

This commit is contained in:
darksun 2018-08-16 12:41:22 +08:00
parent e184a06cbe
commit 40fea64130

View File

@ -17,175 +17,96 @@ You can use different methods to convert quotes. Greg Pittman wrote a [Python sc
To start, I wrote a simple gawk function to evaluate a single character. If that character is a quote, the function determines if it should output a plain quote or a smart quote. The function looks at the previous character; if the previous character is a space, the function outputs a left smart quote. Otherwise, the function outputs a right smart quote. The script does the same for single quotes.
```
function smartquote (char, prevchar) {
        # print smart quotes depending on the previous character
        # otherwise just print the character as-is
        if (prevchar ~ /\s/) {
                # prev char is a space
                if (char == "'") {
                        printf("‘");
                }
                else if (char == "\"") {
                        printf("“");
                }
                else {
                        printf("%c", char);
                }
        }
        else {
                # prev char is not a space
                if (char == "'") {
                        printf("’");
                }
                else if (char == "\"") {
                        printf("”");
                }
                else {
                        printf("%c", char);
                }
        }
}
```
With that function, the body of the gawk script processes the HTML input file character by character. The script prints all text verbatim when inside an HTML tag (for example, `<html lang="en">`. Outside any HTML tags, the script uses the `smartquote()` function to print text. The `smartquote()` function does the work of evaluating when to print plain quotes or smart quotes.
```
function smartquote (char, prevchar) {
        ...
}
BEGIN {htmltag = 0}
{
        # for each line, scan one letter at a time:
        linelen = length($0);
        prev = "\n";
        for (i = 1; i <= linelen; i++) {
                char = substr($0, i, 1);
                if (char == "<") {
                        htmltag = 1;
                }
                if (htmltag == 1) {
                        printf("%c", char);
                }
                else {
                        smartquote(char, prev);
                        prev = char;
                }
                if (char == ">") {
                        htmltag = 0;
                }
        }
        # add trailing newline at end of each line
        printf ("\n");
}
```
Here's an example:
```
gawk -f quotes.awk test.html > test2.html
```
Sample input:
```
<!DOCTYPE html>
<html lang="en">
<head>
  <title>Test page</title>
  <link rel="stylesheet" type="text/css" href="/test.css" />
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width" />
</head>
<body>
  <h1><a href="/"><img src="logo.png" alt="Website logo" /></a></h1>
  <p>"Hi there!"</p>
  <p>It's and its.</p>
</body>
</html>
```
@ -193,33 +114,19 @@ Sample input:
Sample output:
```
<!DOCTYPE html>
<html lang="en">
<head>
  <title>Test page</title>
  <link rel="stylesheet" type="text/css" href="/test.css" />
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width" />
</head>
<body>
  <h1><a href="/"><img src="logo.png" alt="Website logo" /></a></h1>
  <p>&ldquo;Hi there!&rdquo;</p>
  <p>It&rsquo;s and its.</p>
</body>
</html>
```
--------------------------------------------------------------------------------