Finish Translation

This commit is contained in:
GitFuture 2018-01-30 23:06:18 +08:00
parent f7d4a4c636
commit 179b5b5f8a
2 changed files with 336 additions and 342 deletions

View File

@ -1,342 +0,0 @@
BriFuture is Translating this article
Lets Build A Simple Interpreter. Part 3.
======
I woke up this morning and I thought to myself: "Why do we find it so difficult to learn a new skill?"
I don't think it's just because of the hard work. I think that one of the reasons might be that we spend a lot of time and hard work acquiring knowledge by reading and watching and not enough time translating that knowledge into a skill by practicing it. Take swimming, for example. You can spend a lot of time reading hundreds of books about swimming, talk for hours with experienced swimmers and coaches, watch all the training videos available, and you still will sink like a rock the first time you jump in the pool.
The bottom line is: it doesn't matter how well you think you know the subject - you have to put that knowledge into practice to turn it into a skill. To help you with the practice part I put exercises into [Part 1][1] and [Part 2][2] of the series. And yes, you will see more exercises in today's article and in future articles, I promise :)
Okay, let's get started with today's material, shall we?
So far, you've learned how to interpret arithmetic expressions that add or subtract two integers like "7 + 3" or "12 - 9". Today I'm going to talk about how to parse (recognize) and interpret arithmetic expressions that have any number of plus or minus operators in it, for example "7 - 3 + 2 - 1".
Graphically, the arithmetic expressions in this article can be represented with the following syntax diagram:
![][3]
What is a syntax diagram? A **syntax diagram** is a graphical representation of a programming language 's syntax rules. Basically, a syntax diagram visually shows you which statements are allowed in your programming language and which are not.
Syntax diagrams are pretty easy to read: just follow the paths indicated by the arrows. Some paths indicate choices. And some paths indicate loops.
You can read the above syntax diagram as following: a term optionally followed by a plus or minus sign, followed by another term, which in turn is optionally followed by a plus or minus sign followed by another term and so on. You get the picture, literally. You might wonder what a "term" is. For the purpose of this article a "term" is just an integer.
Syntax diagrams serve two main purposes:
* They graphically represent the specification (grammar) of a programming language.
* They can be used to help you write your parser - you can map a diagram to code by following simple rules.
You've learned that the process of recognizing a phrase in the stream of tokens is called **parsing**. And the part of an interpreter or compiler that performs that job is called a **parser**. Parsing is also called **syntax analysis** , and the parser is also aptly called, you guessed it right, a **syntax analyzer**.
According to the syntax diagram above, all of the following arithmetic expressions are valid:
* 3
* 3 + 4
* 7 - 3 + 2 - 1
Because syntax rules for arithmetic expressions in different programming languages are very similar we can use a Python shell to "test" our syntax diagram. Launch your Python shell and see for yourself:
```
>>> 3
3
>>> 3 + 4
7
>>> 7 - 3 + 2 - 1
5
```
No surprises here.
The expression "3 + " is not a valid arithmetic expression though because according to the syntax diagram the plus sign must be followed by a term (integer), otherwise it's a syntax error. Again, try it with a Python shell and see for yourself:
```
>>> 3 +
File "<stdin>", line 1
3 +
^
SyntaxError: invalid syntax
```
It's great to be able to use a Python shell to do some testing but let's map the above syntax diagram to code and use our own interpreter for testing, all right?
You know from the previous articles ([Part 1][1] and [Part 2][2]) that the expr method is where both our parser and interpreter live. Again, the parser just recognizes the structure making sure that it corresponds to some specifications and the interpreter actually evaluates the expression once the parser has successfully recognized (parsed) it.
The following code snippet shows the parser code corresponding to the diagram. The rectangular box from the syntax diagram (term) becomes a term method that parses an integer and the expr method just follows the syntax diagram flow:
```
def term(self):
self.eat(INTEGER)
def expr(self):
# set current token to the first token taken from the input
self.current_token = self.get_next_token()
self.term()
while self.current_token.type in (PLUS, MINUS):
token = self.current_token
if token.type == PLUS:
self.eat(PLUS)
self.term()
elif token.type == MINUS:
self.eat(MINUS)
self.term()
```
You can see that expr first calls the term method. Then the expr method has a while loop which can execute zero or more times. And inside the loop the parser makes a choice based on the token (whether it's a plus or minus sign). Spend some time proving to yourself that the code above does indeed follow the syntax diagram flow for arithmetic expressions.
The parser itself does not interpret anything though: if it recognizes an expression it's silent and if it doesn't, it throws out a syntax error. Let's modify the expr method and add the interpreter code:
```
def term(self):
"""Return an INTEGER token value"""
token = self.current_token
self.eat(INTEGER)
return token.value
def expr(self):
"""Parser / Interpreter """
# set current token to the first token taken from the input
self.current_token = self.get_next_token()
result = self.term()
while self.current_token.type in (PLUS, MINUS):
token = self.current_token
if token.type == PLUS:
self.eat(PLUS)
result = result + self.term()
elif token.type == MINUS:
self.eat(MINUS)
result = result - self.term()
return result
```
Because the interpreter needs to evaluate an expression the term method was modified to return an integer value and the expr method was modified to perform addition and subtraction at the appropriate places and return the result of interpretation. Even though the code is pretty straightforward I recommend spending some time studying it.
Le's get moving and see the complete code of the interpreter now, okay?
Here is the source code for your new version of the calculator that can handle valid arithmetic expressions containing integers and any number of addition and subtraction operators:
```
# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'
class Token(object):
def __init__(self, type, value):
# token type: INTEGER, PLUS, MINUS, or EOF
self.type = type
# token value: non-negative integer value, '+', '-', or None
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS, '+')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
class Interpreter(object):
def __init__(self, text):
# client string input, e.g. "3 + 5", "12 - 5 + 3", etc
self.text = text
# self.pos is an index into self.text
self.pos = 0
# current token instance
self.current_token = None
self.current_char = self.text[self.pos]
##########################################################
# Lexer code #
##########################################################
def error(self):
raise Exception('Invalid syntax')
def advance(self):
"""Advance the `pos` pointer and set the `current_char` variable."""
self.pos += 1
if self.pos > len(self.text) - 1:
self.current_char = None # Indicates end of input
else:
self.current_char = self.text[self.pos]
def skip_whitespace(self):
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Return a (multidigit) integer consumed from the input."""
result = ''
while self.current_char is not None and self.current_char.isdigit():
result += self.current_char
self.advance()
return int(result)
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens. One token at a time.
"""
while self.current_char is not None:
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return Token(INTEGER, self.integer())
if self.current_char == '+':
self.advance()
return Token(PLUS, '+')
if self.current_char == '-':
self.advance()
return Token(MINUS, '-')
self.error()
return Token(EOF, None)
##########################################################
# Parser / Interpreter code #
##########################################################
def eat(self, token_type):
# compare the current token type with the passed token
# type and if they match then "eat" the current token
# and assign the next token to the self.current_token,
# otherwise raise an exception.
if self.current_token.type == token_type:
self.current_token = self.get_next_token()
else:
self.error()
def term(self):
"""Return an INTEGER token value."""
token = self.current_token
self.eat(INTEGER)
return token.value
def expr(self):
"""Arithmetic expression parser / interpreter."""
# set current token to the first token taken from the input
self.current_token = self.get_next_token()
result = self.term()
while self.current_token.type in (PLUS, MINUS):
token = self.current_token
if token.type == PLUS:
self.eat(PLUS)
result = result + self.term()
elif token.type == MINUS:
self.eat(MINUS)
result = result - self.term()
return result
def main():
while True:
try:
# To run under Python3 replace 'raw_input' call
# with 'input'
text = raw_input('calc> ')
except EOFError:
break
if not text:
continue
interpreter = Interpreter(text)
result = interpreter.expr()
print(result)
if __name__ == '__main__':
main()
```
Save the above code into the calc3.py file or download it directly from [GitHub][4]. Try it out. See for yourself that it can handle arithmetic expressions that you can derive from the syntax diagram I showed you earlier.
Here is a sample session that I ran on my laptop:
```
$ python calc3.py
calc> 3
3
calc> 7 - 4
3
calc> 10 + 5
15
calc> 7 - 3 + 2 - 1
5
calc> 10 + 1 + 2 - 3 + 4 + 6 - 15
5
calc> 3 +
Traceback (most recent call last):
File "calc3.py", line 147, in <module>
main()
File "calc3.py", line 142, in main
result = interpreter.expr()
File "calc3.py", line 123, in expr
result = result + self.term()
File "calc3.py", line 110, in term
self.eat(INTEGER)
File "calc3.py", line 105, in eat
self.error()
File "calc3.py", line 45, in error
raise Exception('Invalid syntax')
Exception: Invalid syntax
```
Remember those exercises I mentioned at the beginning of the article: here they are, as promised :)
![][5]
* Draw a syntax diagram for arithmetic expressions that contain only multiplication and division, for example "7 0_sync_master.sh 1_add_new_article_manual.sh 1_add_new_article_newspaper.sh 2_start_translating.sh 3_continue_the_work.sh 4_finish.sh 5_pause.sh base.sh env format.test lctt.cfg parse_url_by_manual.sh parse_url_by_newspaper.py parse_url_by_newspaper.sh README.org reformat.sh 4 / 2 0_sync_master.sh 1_add_new_article_manual.sh 1_add_new_article_newspaper.sh 2_start_translating.sh 3_continue_the_work.sh 4_finish.sh 5_pause.sh base.sh env format.test lctt.cfg parse_url_by_manual.sh parse_url_by_newspaper.py parse_url_by_newspaper.sh README.org reformat.sh 3". Seriously, just grab a pen or a pencil and try to draw one.
* Modify the source code of the calculator to interpret arithmetic expressions that contain only multiplication and division, for example "7 0_sync_master.sh 1_add_new_article_manual.sh 1_add_new_article_newspaper.sh 2_start_translating.sh 3_continue_the_work.sh 4_finish.sh 5_pause.sh base.sh env format.test lctt.cfg parse_url_by_manual.sh parse_url_by_newspaper.py parse_url_by_newspaper.sh README.org reformat.sh 4 / 2 * 3".
* Write an interpreter that handles arithmetic expressions like "7 - 3 + 2 - 1" from scratch. Use any programming language you're comfortable with and write it off the top of your head without looking at the examples. When you do that, think about components involved: a lexer that takes an input and converts it into a stream of tokens, a parser that feeds off the stream of the tokens provided by the lexer and tries to recognize a structure in that stream, and an interpreter that generates results after the parser has successfully parsed (recognized) a valid arithmetic expression. String those pieces together. Spend some time translating the knowledge you've acquired into a working interpreter for arithmetic expressions.
**Check your understanding.**
1. What is a syntax diagram?
2. What is syntax analysis?
3. What is a syntax analyzer?
Hey, look! You read all the way to the end. Thanks for hanging out here today and don't forget to do the exercises. :) I'll be back next time with a new article - stay tuned.
--------------------------------------------------------------------------------
via: https://ruslanspivak.com/lsbasi-part3/
作者:[Ruslan Spivak][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://ruslanspivak.com
[1]:http://ruslanspivak.com/lsbasi-part1/ (Part 1)
[2]:http://ruslanspivak.com/lsbasi-part2/ (Part 2)
[3]:https://ruslanspivak.com/lsbasi-part3/lsbasi_part3_syntax_diagram.png
[4]:https://github.com/rspivak/lsbasi/blob/master/part3/calc3.py
[5]:https://ruslanspivak.com/lsbasi-part3/lsbasi_part3_exercises.png

View File

@ -0,0 +1,336 @@
让我们做个简单的解释器3
======
早上醒来的时候,我就在想:“为什么我们学习一个新技能这么难?”
我不认为那是因为它很难。我认为原因可能在于我们花了太多的时间,而这件难事需要有丰富的阅历和足够的知识,然而我们要把这样的知识转换成技能所用的练习时间又不够。
拿游泳来说,你可以花上几天时间来阅读很多有关游泳的书籍,花几个小时和资深的游泳者和教练交流,观看所有可以获得的训练视频,但你第一次跳进水池的时候,仍然会像一个石头那样沉入水中,
要点在于:你认为自己有多了解那件事都无关紧要 —— 你得通过练习把知识变成技能。为了帮你练习,我把训练放在了这个系列的 [第一部分][1] 和 [第二部分][2] 了。当然,你会在今后的文章中看到更多练习,我保证 )
好,让我们开始今天的学习。
到现在为止,你已经知道了怎样解释像 “7 + 3” 或者 “12 - 9” 这样的两个整数相加减的算术表达式。今天我要说的是怎么解析(识别)、解释有多个数字相加减的算术表达式,比如 “7 - 3 + 2 - 1”。
文中的这个算术表达式可以用下面的这个语法图表示:
![][3]
什么是语法图? **语法图** 是对一门编程语言中的语法规则进行图像化的表示。基本上,一个语法图就能告诉你哪些语句可以在程序中出现,哪些不能出现。
语法图很容易读懂:按照箭头指向的路径。某些路径表示的是判断,有些表示的是循环。
你可以按照以下的方式读上面的语法图:一个 term 后面可以是加号或者减号,接着可以是另一个 term这个 term 后面又可以是一个加号或者减号,后面又是一个 term如此循环。从字面上你就能读懂这个图片了。或许你会奇怪“term” 是什么、对于本文来说“term” 就是个整数。
语法图有两个主要的作用:
* 它们用图形的方式表示一个编程语言的特性(语法)。
* 它们可以用来帮你写出解析器 —— 你可以根据下列简单规则把图片转换成代码。
你已经知道,识别出记号流中的词组的过程就叫做 **解析**。解释器或者编译器执行这个任务的部分叫做 **解析器**。解析也称为 **语法分析**,并且解析器这个名字很合适,你猜的对,就是 **语法分析**
根据上面的语法图,下面这些表达式都是合法的:
* 3
* 3 + 4
* 7 - 3 + 2 - 1
因为算术表达式的语法规则在不同的编程语言里面是很相近的,我们可以用 Python shell 来“测试”语法图。打开 Python shell运行下面的代码
```
>>> 3
3
>>> 3 + 4
7
>>> 7 - 3 + 2 - 1
5
```
意料之中。
表达式 “3 + ” 不是一个有效的数学表达式,根据语法图,加号后面必须要有个 term (整数),否则就是语法错误。然后,自己在 Python shell 里面运行:
```
>>> 3 +
File "<stdin>", line 1
3 +
^
SyntaxError: invalid syntax
```
能用 Python shell 来做这样的测试非常棒,让我们把上面的语法图转换成代码,用我们自己的解释器来测试,怎么样?
从之前的文章里([第一部分][1] 和 [第二部分][2])你知道 expr 方法包含了我们的解析器和解释器。再说一遍,解析器仅仅识别出结构,确保它与某些特性对应,而解释器实际上是在解析器成功识别(解析)特性之后,就立即对表达式进行评估。
以下代码片段显示了对应于图表的解析器代码。语法图里面的矩形方框term变成了 term 方法用于解析整数expr 方法和语法图的流程一致:
```
def term(self):
self.eat(INTEGER)
def expr(self):
# 把当前标记设为从输入中拿到的第一个标记
self.current_token = self.get_next_token()
self.term()
while self.current_token.type in (PLUS, MINUS):
token = self.current_token
if token.type == PLUS:
self.eat(PLUS)
self.term()
elif token.type == MINUS:
self.eat(MINUS)
self.term()
```
你能看到 expr 首先调用了 term 方法。然后 expr 方法里面的 while 循环可以执行 0 或多次。在循环里面解析器基于标记做出判断(是加号还是减号)。花一些时间,你就知道,上述代码确实是遵循着语法图的算术表达式流程。
解析器并不解释任何东西:如果它识别出了一个表达式,它就静默着,如果没有识别出来,就会抛出一个语法错误。改一下 expr 方法,加入解释器的代码:
```
def term(self):
"""Return an INTEGER token value"""
token = self.current_token
self.eat(INTEGER)
return token.value
def expr(self):
"""Parser / Interpreter """
# 将输入中的第一个标记设置成当前标记
self.current_token = self.get_next_token()
result = self.term()
while self.current_token.type in (PLUS, MINUS):
token = self.current_token
if token.type == PLUS:
self.eat(PLUS)
result = result + self.term()
elif token.type == MINUS:
self.eat(MINUS)
result = result - self.term()
return result
```
因为解释器需要评估一个表达式, term 方法被改成返回一个整型值expr 方法被改成在合适的地方执行加法或减法操作,并返回解释的结果。尽管代码很直白,我建议花点时间去理解它。
进行下一步,看看完整的解释器代码,好不?
这时新版计算器的源代码,它可以处理包含有任意多个加法和减法运算的有效的数学表达式。
```
# 标记类型
#
# EOF (end-of-file 文件末尾) 标记是用来表示所有输入都解析完成
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'
class Token(object):
def __init__(self, type, value):
# token 类型: INTEGER, PLUS, MINUS, or EOF
self.type = type
# token 值: 非负整数值, '+', '-', 或无
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS, '+')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
class Interpreter(object):
def __init__(self, text):
# 客户端字符输入, 例如. "3 + 5", "12 - 5",
self.text = text
# self.pos is an index into self.text
self.pos = 0
# 当前标记实例
self.current_token = None
self.current_char = self.text[self.pos]
##########################################################
# Lexer code #
##########################################################
def error(self):
raise Exception('Invalid syntax')
def advance(self):
"""Advance the `pos` pointer and set the `current_char` variable."""
self.pos += 1
if self.pos > len(self.text) - 1:
self.current_char = None # Indicates end of input
else:
self.current_char = self.text[self.pos]
def skip_whitespace(self):
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Return a (multidigit) integer consumed from the input."""
result = ''
while self.current_char is not None and self.current_char.isdigit():
result += self.current_char
self.advance()
return int(result)
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens. One token at a time.
"""
while self.current_char is not None:
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return Token(INTEGER, self.integer())
if self.current_char == '+':
self.advance()
return Token(PLUS, '+')
if self.current_char == '-':
self.advance()
return Token(MINUS, '-')
self.error()
return Token(EOF, None)
##########################################################
# Parser / Interpreter code #
##########################################################
def eat(self, token_type):
# 将当前的标记类型与传入的标记类型作比较,如果他们相匹配,就
# “eat” 掉当前的标记并将下一个标记赋给 self.current_token
# 否则抛出一个异常
if self.current_token.type == token_type:
self.current_token = self.get_next_token()
else:
self.error()
def term(self):
"""Return an INTEGER token value."""
token = self.current_token
self.eat(INTEGER)
return token.value
def expr(self):
"""Arithmetic expression parser / interpreter."""
# 将输入中的第一个标记设置成当前标记
self.current_token = self.get_next_token()
result = self.term()
while self.current_token.type in (PLUS, MINUS):
token = self.current_token
if token.type == PLUS:
self.eat(PLUS)
result = result + self.term()
elif token.type == MINUS:
self.eat(MINUS)
result = result - self.term()
return result
def main():
while True:
try:
# To run under Python3 replace 'raw_input' call
# 要在 Python3 下运行,请把 raw_input 的调用换成 input
text = raw_input('calc> ')
except EOFError:
break
if not text:
continue
interpreter = Interpreter(text)
result = interpreter.expr()
print(result)
if __name__ == '__main__':
main()
```
把上面的代码保存到 calc3.py 文件中,或者直接从 [GitHub][4] 上下载。试着运行它。看看它能不能处理我之前给你看过的语法图里面派生出的数学表达式。
这是我在自己的笔记本上运行的示例:
```
$ python calc3.py
calc> 3
3
calc> 7 - 4
3
calc> 10 + 5
15
calc> 7 - 3 + 2 - 1
5
calc> 10 + 1 + 2 - 3 + 4 + 6 - 15
5
calc> 3 +
Traceback (most recent call last):
File "calc3.py", line 147, in <module>
main()
File "calc3.py", line 142, in main
result = interpreter.expr()
File "calc3.py", line 123, in expr
result = result + self.term()
File "calc3.py", line 110, in term
self.eat(INTEGER)
File "calc3.py", line 105, in eat
self.error()
File "calc3.py", line 45, in error
raise Exception('Invalid syntax')
Exception: Invalid syntax
```
记得我在文章开始时提过的练习吗:他们在这儿,我保证过的:)
![][5]
* 画出只包含乘法和除法的数学表达式的语法图,比如 “7 * 4 / 2 * 3”。认真点拿只钢笔或铅笔试着画一个。
修改计算器的源代码,解释只包含乘法和除法的数学表达式。比如 “7 * 4 / 2 * 3”。
* 从头写一个可以处理像 “7 - 3 + 2 - 1” 这样的数学表达式的解释器。用你熟悉的编程语言,不看示例代码自己思考着写出代码。做的时候要想一想这里面包含的组件:一个 lexer读取输入并转换成标记流一个解析器从 lexer 提供的记号流中取食,并且尝试识别流中的结构,一个解释器,在解析器成功解析(识别)有效的数学表达式后产生结果。把这些要点串起来。花一点时间把你获得的知识变成一个可以运行的数学表达式的解释器。
**检验你的理解。**
1. 什么是语法图?
2. 什么是语法分析?
3. 什么是语法分析器?
嘿,看!你看完了所有内容。感谢你们坚持到今天,而且没有忘记练习。:) 下次我会带着新的文章回来,尽请期待。
--------------------------------------------------------------------------------
via: https://ruslanspivak.com/lsbasi-part3/
作者:[Ruslan Spivak][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://ruslanspivak.com
[1]:http://ruslanspivak.com/lsbasi-part1/ (Part 1)
[2]:http://ruslanspivak.com/lsbasi-part2/ (Part 2)
[3]:https://ruslanspivak.com/lsbasi-part3/lsbasi_part3_syntax_diagram.png
[4]:https://github.com/rspivak/lsbasi/blob/master/part3/calc3.py
[5]:https://ruslanspivak.com/lsbasi-part3/lsbasi_part3_exercises.png