Merge pull request #7280 from BriFuture/master

Translation Finished
This commit is contained in:
DarkSun 2018-01-19 21:41:12 +08:00 committed by GitHub
commit 710153ba8c
2 changed files with 234 additions and 246 deletions

View File

@ -1,246 +0,0 @@
BriFuture is translating this article.
Lets Build A Simple Interpreter. Part 2.
======
In their amazing book "The 5 Elements of Effective Thinking" the authors Burger and Starbird share a story about how they observed Tony Plog, an internationally acclaimed trumpet virtuoso, conduct a master class for accomplished trumpet players. The students first played complex music phrases, which they played perfectly well. But then they were asked to play very basic, simple notes. When they played the notes, the notes sounded childish compared to the previously played complex phrases. After they finished playing, the master teacher also played the same notes, but when he played them, they did not sound childish. The difference was stunning. Tony explained that mastering the performance of simple notes allows one to play complex pieces with greater control. The lesson was clear - to build true virtuosity one must focus on mastering simple, basic ideas.
The lesson in the story clearly applies not only to music but also to software development. The story is a good reminder to all of us to not lose sight of the importance of deep work on simple, basic ideas even if it sometimes feels like a step back. While it is important to be proficient with a tool or framework you use, it is also extremely important to know the principles behind them. As Ralph Waldo Emerson said:
> "If you learn only methods, you'll be tied to your methods. But if you learn principles, you can devise your own methods."
On that note, let's dive into interpreters and compilers again.
Today I will show you a new version of the calculator from [Part 1][1] that will be able to:
1. Handle whitespace characters anywhere in the input string
2. Consume multi-digit integers from the input
3. Subtract two integers (currently it can only add integers)
Here is the source code for your new version of the calculator that can do all of the above:
```
# Token types
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'
class Token(object):
def __init__(self, type, value):
# token type: INTEGER, PLUS, MINUS, or EOF
self.type = type
# token value: non-negative integer value, '+', '-', or None
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS '+')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
class Interpreter(object):
def __init__(self, text):
# client string input, e.g. "3 + 5", "12 - 5", etc
self.text = text
# self.pos is an index into self.text
self.pos = 0
# current token instance
self.current_token = None
self.current_char = self.text[self.pos]
def error(self):
raise Exception('Error parsing input')
def advance(self):
"""Advance the 'pos' pointer and set the 'current_char' variable."""
self.pos += 1
if self.pos > len(self.text) - 1:
self.current_char = None # Indicates end of input
else:
self.current_char = self.text[self.pos]
def skip_whitespace(self):
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Return a (multidigit) integer consumed from the input."""
result = ''
while self.current_char is not None and self.current_char.isdigit():
result += self.current_char
self.advance()
return int(result)
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens.
"""
while self.current_char is not None:
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return Token(INTEGER, self.integer())
if self.current_char == '+':
self.advance()
return Token(PLUS, '+')
if self.current_char == '-':
self.advance()
return Token(MINUS, '-')
self.error()
return Token(EOF, None)
def eat(self, token_type):
# compare the current token type with the passed token
# type and if they match then "eat" the current token
# and assign the next token to the self.current_token,
# otherwise raise an exception.
if self.current_token.type == token_type:
self.current_token = self.get_next_token()
else:
self.error()
def expr(self):
"""Parser / Interpreter
expr -> INTEGER PLUS INTEGER
expr -> INTEGER MINUS INTEGER
"""
# set current token to the first token taken from the input
self.current_token = self.get_next_token()
# we expect the current token to be an integer
left = self.current_token
self.eat(INTEGER)
# we expect the current token to be either a '+' or '-'
op = self.current_token
if op.type == PLUS:
self.eat(PLUS)
else:
self.eat(MINUS)
# we expect the current token to be an integer
right = self.current_token
self.eat(INTEGER)
# after the above call the self.current_token is set to
# EOF token
# at this point either the INTEGER PLUS INTEGER or
# the INTEGER MINUS INTEGER sequence of tokens
# has been successfully found and the method can just
# return the result of adding or subtracting two integers,
# thus effectively interpreting client input
if op.type == PLUS:
result = left.value + right.value
else:
result = left.value - right.value
return result
def main():
while True:
try:
# To run under Python3 replace 'raw_input' call
# with 'input'
text = raw_input('calc> ')
except EOFError:
break
if not text:
continue
interpreter = Interpreter(text)
result = interpreter.expr()
print(result)
if __name__ == '__main__':
main()
```
Save the above code into the calc2.py file or download it directly from [GitHub][2]. Try it out. See for yourself that it works as expected: it can handle whitespace characters anywhere in the input; it can accept multi-digit integers, and it can also subtract two integers as well as add two integers.
Here is a sample session that I ran on my laptop:
```
$ python calc2.py
calc> 27 + 3
30
calc> 27 - 7
20
calc>
```
The major code changes compared with the version from [Part 1][1] are:
1. The get_next_token method was refactored a bit. The logic to increment the pos pointer was factored into a separate method advance.
2. Two more methods were added: skip_whitespace to ignore whitespace characters and integer to handle multi-digit integers in the input.
3. The expr method was modified to recognize INTEGER -> MINUS -> INTEGER phrase in addition to INTEGER -> PLUS -> INTEGER phrase. The method now also interprets both addition and subtraction after having successfully recognized the corresponding phrase.
In [Part 1][1] you learned two important concepts, namely that of a **token** and a **lexical analyzer**. Today I would like to talk a little bit about **lexemes** , **parsing** , and **parsers**.
You already know about tokens. But in order for me to round out the discussion of tokens I need to mention lexemes. What is a lexeme? A **lexeme** is a sequence of characters that form a token. In the following picture you can see some examples of tokens and sample lexemes and hopefully it will make the relationship between them clear:
![][3]
Now, remember our friend, the expr method? I said before that that's where the interpretation of an arithmetic expression actually happens. But before you can interpret an expression you first need to recognize what kind of phrase it is, whether it is addition or subtraction, for example. That's what the expr method essentially does: it finds the structure in the stream of tokens it gets from the get_next_token method and then it interprets the phrase that is has recognized, generating the result of the arithmetic expression.
The process of finding the structure in the stream of tokens, or put differently, the process of recognizing a phrase in the stream of tokens is called **parsing**. The part of an interpreter or compiler that performs that job is called a **parser**.
So now you know that the expr method is the part of your interpreter where both **parsing** and **interpreting** happens - the expr method first tries to recognize ( **parse** ) the INTEGER -> PLUS -> INTEGER or the INTEGER -> MINUS -> INTEGER phrase in the stream of tokens and after it has successfully recognized ( **parsed** ) one of those phrases, the method interprets it and returns the result of either addition or subtraction of two integers to the caller.
And now it's time for exercises again.
![][4]
1. Extend the calculator to handle multiplication of two integers
2. Extend the calculator to handle division of two integers
3. Modify the code to interpret expressions containing an arbitrary number of additions and subtractions, for example "9 - 5 + 3 + 11"
**Check your understanding.**
1. What is a lexeme?
2. What is the name of the process that finds the structure in the stream of tokens, or put differently, what is the name of the process that recognizes a certain phrase in that stream of tokens?
3. What is the name of the part of the interpreter (compiler) that does parsing?
I hope you liked today's material. In the next article of the series you will extend your calculator to handle more complex arithmetic expressions. Stay tuned.
--------------------------------------------------------------------------------
via: https://ruslanspivak.com/lsbasi-part2/
作者:[Ruslan Spivak][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://ruslanspivak.com
[1]:http://ruslanspivak.com/lsbasi-part1/ (Part 1)
[2]:https://github.com/rspivak/lsbasi/blob/master/part2/calc2.py
[3]:https://ruslanspivak.com/lsbasi-part2/lsbasi_part2_lexemes.png
[4]:https://ruslanspivak.com/lsbasi-part2/lsbasi_part2_exercises.png

View File

@ -0,0 +1,234 @@
让我们做个简单的解释器2
======
在一本叫做 《高效思考的 5 要素》 的书中,作者 Burger 和 Starbird 讲述了一个关于他们如何研究 Tony Plog 的故事一个举世闻名的交响曲名家为一些有才华的演奏者开创了一个大师班。这些学生一开始演奏复杂的乐曲他们演奏的非常好。然后他们被要求演奏非常基础简单的乐曲。当他们演奏这些乐曲时与之前所演奏的相比听起来非常幼稚。在他们结束演奏后老师也演奏了同样的乐曲但是听上去非常娴熟。差别令人震惊。Tony 解释道,精通简单符号可以让人更好的掌握复杂的部分。这个例子很清晰 - 要成为真正的名家,必须要掌握简单基础的思想。
故事中的例子明显不仅仅适用于音乐,而且适用于软件开发。这个故事告诉我们不要忽视繁琐工作中简单基础的概念的重要性,哪怕有时候这让人感觉是一种倒退。尽管熟练掌握一门工具或者框架非常重要,了解他们背后的原理也是极其重要的。正如 Palph Waldo Emerson 所说:
> “如果你只学习方法,你就会被方法束缚。但如果你知道原理,就可以发明自己的方法。”
有鉴于此,让我们再次深入了解解释器和编译器。
今天我会向你们展示一个全新的计算器,与 [第一部分][1] 相比,它可以做到:
1. 处理输入字符串任意位置的空白符
2. 识别输入字符串中的多位整数
3. 做两个整数之间的减法(目前它仅能加减整数)
新版本计算器的源代码在这里,它可以做到上述的所有事情:
```
# 标记类型
# EOF (end-of-file 文件末尾) 标记是用来表示所有输入都解析完成
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'
class Token(object):
def __init__(self, type, value):
# token 类型: INTEGER, PLUS, MINUS, or EOF
self.type = type
# token 值: 非负整数值, '+', '-', 或无
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS '+')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
class Interpreter(object):
def __init__(self, text):
# 客户端字符输入, 例如. "3 + 5", "12 - 5",
self.text = text
# self.pos 是 self.text 的索引
self.pos = 0
# 当前标记实例
self.current_token = None
self.current_char = self.text[self.pos]
def error(self):
raise Exception('Error parsing input')
def advance(self):
"""Advance the 'pos' pointer and set the 'current_char' variable."""
self.pos += 1
if self.pos > len(self.text) - 1:
self.current_char = None # Indicates end of input
else:
self.current_char = self.text[self.pos]
def skip_whitespace(self):
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Return a (multidigit) integer consumed from the input."""
result = ''
while self.current_char is not None and self.current_char.isdigit():
result += self.current_char
self.advance()
return int(result)
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens.
"""
while self.current_char is not None:
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return Token(INTEGER, self.integer())
if self.current_char == '+':
self.advance()
return Token(PLUS, '+')
if self.current_char == '-':
self.advance()
return Token(MINUS, '-')
self.error()
return Token(EOF, None)
def eat(self, token_type):
# 将当前的标记类型与传入的标记类型作比较,如果他们相匹配,就
# “eat” 掉当前的标记并将下一个标记赋给 self.current_token
# 否则抛出一个异常
if self.current_token.type == token_type:
self.current_token = self.get_next_token()
else:
self.error()
def expr(self):
"""Parser / Interpreter
expr -> INTEGER PLUS INTEGER
expr -> INTEGER MINUS INTEGER
"""
# 将输入中的第一个标记设置成当前标记
self.current_token = self.get_next_token()
# 当前标记应该是一个整数
left = self.current_token
self.eat(INTEGER)
# 当前标记应该是 +-
op = self.current_token
if op.type == PLUS:
self.eat(PLUS)
else:
self.eat(MINUS)
# 当前标记应该是一个整数
right = self.current_token
self.eat(INTEGER)
# 在上述函数调用后self.current_token 就被设为 EOF 标记
# 这时要么是成功地找到 INTEGER PLUS INTEGER要么是 INTEGER MINUS INTEGER
# 序列的标记,并且这个方法可以仅仅返回两个整数的加或减的结果,就能高效解释客户端的输入
if op.type == PLUS:
result = left.value + right.value
else:
result = left.value - right.value
return result
def main():
while True:
try:
# To run under Python3 replace 'raw_input' call
# with 'input'
text = raw_input('calc> ')
except EOFError:
break
if not text:
continue
interpreter = Interpreter(text)
result = interpreter.expr()
print(result)
if __name__ == '__main__':
main()
```
把上面的代码保存到 calc2.py 文件中,或者直接从 [GitHub][2] 上下载。试着运行它。看看它是不是正常工作:它应该能够处理输入中任意位置的空白符;能够接受多位的整数,并且能够对两个整数做减法和加法。
这是我在自己的笔记本上运行的示例:
```
$ python calc2.py
calc> 27 + 3
30
calc> 27 - 7
20
calc>
```
与 [第一部分][1] 的版本相比,主要的代码改动有:
1. get_next_token 方法重写了很多。增加指针位置的逻辑之前是放在一个单独的方法中。
2. 增加了一些方法skip_whitespace 用于忽略空白字符integer 用于处理输入字符的多位整数。
3. expr 方法修改成了可以识别 “整数 -> 减号 -> 整数” 词组和 “整数 -> 加号 -> 整数” 词组。在成功识别相应的词组后,这个方法现在可以解释加法和减法。
[第一部分][1] 中你学到了两个重要的概念,叫做 **标记****词法分析**。现在我想谈一谈 **词法** **解析**,和**解析器**。
你已经知道标记。但是为了让我详细的讨论标记,我需要谈一谈词法。词法是什么?**词法** 是一个标记中的字符序列。在下图中你可以看到一些关于标记的例子,还好这可以让它们之间的关系变得清晰:
![][3]
现在还记得我们的朋友expr 方法吗我之前说过这是数学表达式实际被解释的地方。但是你要先识别这个表达式有哪些词组才能解释它比如它是加法还是减法。expr 方法最重要的工作是:它从 get_next_token 方法中得到流,并找出标记流的结构然后解释已经识别出的词组,产生数学表达式的结果。
在标记流中找出结构的过程,或者换种说法,识别标记流中的词组的过程就叫 **解析**。解释器或者编译器中执行这个任务的部分就叫做 **解析器**
现在你知道 expr 方法就是你的解释器的部分,**解析** 和 **解释** 都在这里发生 - expr 方法首先尝试识别(**解析**)标记流里的 “整数 -> 加法 -> 整数” 或者 “整数 -> 减法 -> 整数” 词组,成功识别后 **解析** 其中一个词组,这个方法就开始解释它,返回两个整数的和或差。
又到了练习的时间。
![][4]
1. 扩展这个计算器,让它能够计算两个整数的乘法
2. 扩展这个计算器,让它能够计算两个整数的除法
3. 修改代码,让它能够解释包含了任意数量的加法和减法的表达式,比如 “9 - 5 + 3 + 11”
**检验你的理解:**
1. 词法是什么?
2. 找出标记流结构的过程叫什么,或者换种说法,识别标记流中一个词组的过程叫什么?
3. 解释器(编译器)执行解析的部分叫什么?
希望你喜欢今天的内容。在该系列的下一篇文章里你就能扩展计算器从而处理更多复杂的算术表达式。敬请期待。
--------------------------------------------------------------------------------
via: https://ruslanspivak.com/lsbasi-part2/
作者:[Ruslan Spivak][a]
译者:[BriFuture](https://github.com/BriFuture)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://ruslanspivak.com
[1]:http://ruslanspivak.com/lsbasi-part1/ (Part 1)
[2]:https://github.com/rspivak/lsbasi/blob/master/part2/calc2.py
[3]:https://ruslanspivak.com/lsbasi-part2/lsbasi_part2_lexemes.png
[4]:https://ruslanspivak.com/lsbasi-part2/lsbasi_part2_exercises.png