mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-26 21:30:55 +08:00
commit
be86eda0a7
@ -1,343 +0,0 @@
|
||||
BriFuture is translating this article
|
||||
|
||||
Let’s Build A Simple Interpreter. Part 1.
|
||||
======
|
||||
|
||||
|
||||
> **" If you don't know how compilers work, then you don't know how computers work. If you're not 100% sure whether you know how compilers work, then you don't know how they work."** -- Steve Yegge
|
||||
|
||||
There you have it. Think about it. It doesn't really matter whether you're a newbie or a seasoned software developer: if you don't know how compilers and interpreters work, then you don't know how computers work. It's that simple.
|
||||
|
||||
So, do you know how compilers and interpreters work? And I mean, are you 100% sure that you know how they work? If you don't. ![][1]
|
||||
|
||||
Or if you don't and you're really agitated about it. ![][2]
|
||||
|
||||
Do not worry. If you stick around and work through the series and build an interpreter and a compiler with me you will know how they work in the end. And you will become a confident happy camper too. At least I hope so. ![][3]
|
||||
|
||||
Why would you study interpreters and compilers? I will give you three reasons.
|
||||
|
||||
1. To write an interpreter or a compiler you have to have a lot of technical skills that you need to use together. Writing an interpreter or a compiler will help you improve those skills and become a better software developer. As well, the skills you will learn are useful in writing any software, not just interpreters or compilers.
|
||||
2. You really want to know how computers work. Often interpreters and compilers look like magic. And you shouldn't be comfortable with that magic. You want to demystify the process of building an interpreter and a compiler, understand how they work, and get in control of things.
|
||||
3. You want to create your own programming language or domain specific language. If you create one, you will also need to create either an interpreter or a compiler for it. Recently, there has been a resurgence of interest in new programming languages. And you can see a new programming language pop up almost every day: Elixir, Go, Rust just to name a few.
|
||||
|
||||
|
||||
|
||||
|
||||
Okay, but what are interpreters and compilers?
|
||||
|
||||
The goal of an **interpreter** or a **compiler** is to translate a source program in some high-level language into some other form. Pretty vague, isn 't it? Just bear with me, later in the series you will learn exactly what the source program is translated into.
|
||||
|
||||
At this point you may also wonder what the difference is between an interpreter and a compiler. For the purpose of this series, let's agree that if a translator translates a source program into machine language, it is a **compiler**. If a translator processes and executes the source program without translating it into machine language first, it is an **interpreter**. Visually it looks something like this:
|
||||
|
||||
![][4]
|
||||
|
||||
I hope that by now you're convinced that you really want to study and build an interpreter and a compiler. What can you expect from this series on interpreters?
|
||||
|
||||
Here is the deal. You and I are going to create a simple interpreter for a large subset of [Pascal][5] language. At the end of this series you will have a working Pascal interpreter and a source-level debugger like Python's [pdb][6].
|
||||
|
||||
You might ask, why Pascal? For one thing, it's not a made-up language that I came up with just for this series: it's a real programming language that has many important language constructs. And some old, but useful, CS books use Pascal programming language in their examples (I understand that that's not a particularly compelling reason to choose a language to build an interpreter for, but I thought it would be nice for a change to learn a non-mainstream language :)
|
||||
|
||||
Here is an example of a factorial function in Pascal that you will be able to interpret with your own interpreter and debug with the interactive source-level debugger that you will create along the way:
|
||||
```
|
||||
program factorial;
|
||||
|
||||
function factorial(n: integer): longint;
|
||||
begin
|
||||
if n = 0 then
|
||||
factorial := 1
|
||||
else
|
||||
factorial := n * factorial(n - 1);
|
||||
end;
|
||||
|
||||
var
|
||||
n: integer;
|
||||
|
||||
begin
|
||||
for n := 0 to 16 do
|
||||
writeln(n, '! = ', factorial(n));
|
||||
end.
|
||||
```
|
||||
|
||||
The implementation language of the Pascal interpreter will be Python, but you can use any language you want because the ideas presented don't depend on any particular implementation language. Okay, let's get down to business. Ready, set, go!
|
||||
|
||||
You will start your first foray into interpreters and compilers by writing a simple interpreter of arithmetic expressions, also known as a calculator. Today the goal is pretty minimalistic: to make your calculator handle the addition of two single digit integers like **3+5**. Here is the source code for your calculator, sorry, interpreter:
|
||||
|
||||
```
|
||||
# Token types
|
||||
#
|
||||
# EOF (end-of-file) token is used to indicate that
|
||||
# there is no more input left for lexical analysis
|
||||
INTEGER, PLUS, EOF = 'INTEGER', 'PLUS', 'EOF'
|
||||
|
||||
|
||||
class Token(object):
|
||||
def __init__(self, type, value):
|
||||
# token type: INTEGER, PLUS, or EOF
|
||||
self.type = type
|
||||
# token value: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, '+', or None
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
"""String representation of the class instance.
|
||||
|
||||
Examples:
|
||||
Token(INTEGER, 3)
|
||||
Token(PLUS '+')
|
||||
"""
|
||||
return 'Token({type}, {value})'.format(
|
||||
type=self.type,
|
||||
value=repr(self.value)
|
||||
)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__str__()
|
||||
|
||||
|
||||
class Interpreter(object):
|
||||
def __init__(self, text):
|
||||
# client string input, e.g. "3+5"
|
||||
self.text = text
|
||||
# self.pos is an index into self.text
|
||||
self.pos = 0
|
||||
# current token instance
|
||||
self.current_token = None
|
||||
|
||||
def error(self):
|
||||
raise Exception('Error parsing input')
|
||||
|
||||
def get_next_token(self):
|
||||
"""Lexical analyzer (also known as scanner or tokenizer)
|
||||
|
||||
This method is responsible for breaking a sentence
|
||||
apart into tokens. One token at a time.
|
||||
"""
|
||||
text = self.text
|
||||
|
||||
# is self.pos index past the end of the self.text ?
|
||||
# if so, then return EOF token because there is no more
|
||||
# input left to convert into tokens
|
||||
if self.pos > len(text) - 1:
|
||||
return Token(EOF, None)
|
||||
|
||||
# get a character at the position self.pos and decide
|
||||
# what token to create based on the single character
|
||||
current_char = text[self.pos]
|
||||
|
||||
# if the character is a digit then convert it to
|
||||
# integer, create an INTEGER token, increment self.pos
|
||||
# index to point to the next character after the digit,
|
||||
# and return the INTEGER token
|
||||
if current_char.isdigit():
|
||||
token = Token(INTEGER, int(current_char))
|
||||
self.pos += 1
|
||||
return token
|
||||
|
||||
if current_char == '+':
|
||||
token = Token(PLUS, current_char)
|
||||
self.pos += 1
|
||||
return token
|
||||
|
||||
self.error()
|
||||
|
||||
def eat(self, token_type):
|
||||
# compare the current token type with the passed token
|
||||
# type and if they match then "eat" the current token
|
||||
# and assign the next token to the self.current_token,
|
||||
# otherwise raise an exception.
|
||||
if self.current_token.type == token_type:
|
||||
self.current_token = self.get_next_token()
|
||||
else:
|
||||
self.error()
|
||||
|
||||
def expr(self):
|
||||
"""expr -> INTEGER PLUS INTEGER"""
|
||||
# set current token to the first token taken from the input
|
||||
self.current_token = self.get_next_token()
|
||||
|
||||
# we expect the current token to be a single-digit integer
|
||||
left = self.current_token
|
||||
self.eat(INTEGER)
|
||||
|
||||
# we expect the current token to be a '+' token
|
||||
op = self.current_token
|
||||
self.eat(PLUS)
|
||||
|
||||
# we expect the current token to be a single-digit integer
|
||||
right = self.current_token
|
||||
self.eat(INTEGER)
|
||||
# after the above call the self.current_token is set to
|
||||
# EOF token
|
||||
|
||||
# at this point INTEGER PLUS INTEGER sequence of tokens
|
||||
# has been successfully found and the method can just
|
||||
# return the result of adding two integers, thus
|
||||
# effectively interpreting client input
|
||||
result = left.value + right.value
|
||||
return result
|
||||
|
||||
|
||||
def main():
|
||||
while True:
|
||||
try:
|
||||
# To run under Python3 replace 'raw_input' call
|
||||
# with 'input'
|
||||
text = raw_input('calc> ')
|
||||
except EOFError:
|
||||
break
|
||||
if not text:
|
||||
continue
|
||||
interpreter = Interpreter(text)
|
||||
result = interpreter.expr()
|
||||
print(result)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
```
|
||||
|
||||
|
||||
Save the above code into calc1.py file or download it directly from [GitHub][7]. Before you start digging deeper into the code, run the calculator on the command line and see it in action. Play with it! Here is a sample session on my laptop (if you want to run the calculator under Python3 you will need to replace raw_input with input):
|
||||
```
|
||||
$ python calc1.py
|
||||
calc> 3+4
|
||||
7
|
||||
calc> 3+5
|
||||
8
|
||||
calc> 3+9
|
||||
12
|
||||
calc>
|
||||
```
|
||||
|
||||
For your simple calculator to work properly without throwing an exception, your input needs to follow certain rules:
|
||||
|
||||
* Only single digit integers are allowed in the input
|
||||
* The only arithmetic operation supported at the moment is addition
|
||||
* No whitespace characters are allowed anywhere in the input
|
||||
|
||||
|
||||
|
||||
Those restrictions are necessary to make the calculator simple. Don't worry, you'll make it pretty complex pretty soon.
|
||||
|
||||
Okay, now let's dive in and see how your interpreter works and how it evaluates arithmetic expressions.
|
||||
|
||||
When you enter an expression 3+5 on the command line your interpreter gets a string "3+5". In order for the interpreter to actually understand what to do with that string it first needs to break the input "3+5" into components called **tokens**. A **token** is an object that has a type and a value. For example, for the string "3" the type of the token will be INTEGER and the corresponding value will be integer 3.
|
||||
|
||||
The process of breaking the input string into tokens is called **lexical analysis**. So, the first step your interpreter needs to do is read the input of characters and convert it into a stream of tokens. The part of the interpreter that does it is called a **lexical analyzer** , or **lexer** for short. You might also encounter other names for the same component, like **scanner** or **tokenizer**. They all mean the same: the part of your interpreter or compiler that turns the input of characters into a stream of tokens.
|
||||
|
||||
The method get_next_token of the Interpreter class is your lexical analyzer. Every time you call it, you get the next token created from the input of characters passed to the interpreter. Let's take a closer look at the method itself and see how it actually does its job of converting characters into tokens. The input is stored in the variable text that holds the input string and pos is an index into that string (think of the string as an array of characters). pos is initially set to 0 and points to the character '3'. The method first checks whether the character is a digit and if so, it increments pos and returns a token instance with the type INTEGER and the value set to the integer value of the string '3', which is an integer 3:
|
||||
|
||||
![][8]
|
||||
|
||||
The pos now points to the '+' character in the text. The next time you call the method, it tests if a character at the position pos is a digit and then it tests if the character is a plus sign, which it is. As a result the method increments pos and returns a newly created token with the type PLUS and value '+':
|
||||
|
||||
![][9]
|
||||
|
||||
The pos now points to character '5'. When you call the get_next_token method again the method checks if it's a digit, which it is, so it increments pos and returns a new INTEGER token with the value of the token set to integer 5: ![][10]
|
||||
|
||||
Because the pos index is now past the end of the string "3+5" the get_next_token method returns the EOF token every time you call it:
|
||||
|
||||
![][11]
|
||||
|
||||
Try it out and see for yourself how the lexer component of your calculator works:
|
||||
```
|
||||
>>> from calc1 import Interpreter
|
||||
>>>
|
||||
>>> interpreter = Interpreter('3+5')
|
||||
>>> interpreter.get_next_token()
|
||||
Token(INTEGER, 3)
|
||||
>>>
|
||||
>>> interpreter.get_next_token()
|
||||
Token(PLUS, '+')
|
||||
>>>
|
||||
>>> interpreter.get_next_token()
|
||||
Token(INTEGER, 5)
|
||||
>>>
|
||||
>>> interpreter.get_next_token()
|
||||
Token(EOF, None)
|
||||
>>>
|
||||
```
|
||||
|
||||
So now that your interpreter has access to the stream of tokens made from the input characters, the interpreter needs to do something with it: it needs to find the structure in the flat stream of tokens it gets from the lexer get_next_token. Your interpreter expects to find the following structure in that stream: INTEGER -> PLUS -> INTEGER. That is, it tries to find a sequence of tokens: integer followed by a plus sign followed by an integer.
|
||||
|
||||
The method responsible for finding and interpreting that structure is expr. This method verifies that the sequence of tokens does indeed correspond to the expected sequence of tokens, i.e INTEGER -> PLUS -> INTEGER. After it's successfully confirmed the structure, it generates the result by adding the value of the token on the left side of the PLUS and the right side of the PLUS, thus successfully interpreting the arithmetic expression you passed to the interpreter.
|
||||
|
||||
The expr method itself uses the helper method eat to verify that the token type passed to the eat method matches the current token type. After matching the passed token type the eat method gets the next token and assigns it to the current_token variable, thus effectively "eating" the currently matched token and advancing the imaginary pointer in the stream of tokens. If the structure in the stream of tokens doesn't correspond to the expected INTEGER PLUS INTEGER sequence of tokens the eat method throws an exception.
|
||||
|
||||
Let's recap what your interpreter does to evaluate an arithmetic expression:
|
||||
|
||||
* The interpreter accepts an input string, let's say "3+5"
|
||||
* The interpreter calls the expr method to find a structure in the stream of tokens returned by the lexical analyzer get_next_token. The structure it tries to find is of the form INTEGER PLUS INTEGER. After it's confirmed the structure, it interprets the input by adding the values of two INTEGER tokens because it's clear to the interpreter at that point that what it needs to do is add two integers, 3 and 5.
|
||||
|
||||
Congratulate yourself. You've just learned how to build your very first interpreter!
|
||||
|
||||
Now it's time for exercises.
|
||||
|
||||
![][12]
|
||||
|
||||
You didn't think you would just read this article and that would be enough, did you? Okay, get your hands dirty and do the following exercises:
|
||||
|
||||
1. Modify the code to allow multiple-digit integers in the input, for example "12+3"
|
||||
2. Add a method that skips whitespace characters so that your calculator can handle inputs with whitespace characters like " 12 + 3"
|
||||
3. Modify the code and instead of '+' handle '-' to evaluate subtractions like "7-5"
|
||||
|
||||
|
||||
|
||||
**Check your understanding**
|
||||
|
||||
1. What is an interpreter?
|
||||
2. What is a compiler?
|
||||
3. What's the difference between an interpreter and a compiler?
|
||||
4. What is a token?
|
||||
5. What is the name of the process that breaks input apart into tokens?
|
||||
6. What is the part of the interpreter that does lexical analysis called?
|
||||
7. What are the other common names for that part of an interpreter or a compiler?
|
||||
|
||||
|
||||
|
||||
Before I finish this article, I really want you to commit to studying interpreters and compilers. And I want you to do it right now. Don't put it on the back burner. Don't wait. If you've skimmed the article, start over. If you've read it carefully but haven't done exercises - do them now. If you've done only some of them, finish the rest. You get the idea. And you know what? Sign the commitment pledge to start learning about interpreters and compilers today!
|
||||
|
||||
|
||||
|
||||
_I, ________, of being sound mind and body, do hereby pledge to commit to studying interpreters and compilers starting today and get to a point where I know 100% how they work!_
|
||||
|
||||
Signature:
|
||||
|
||||
Date:
|
||||
|
||||
![][13]
|
||||
|
||||
Sign it, date it, and put it somewhere where you can see it every day to make sure that you stick to your commitment. And keep in mind the definition of commitment:
|
||||
|
||||
> "Commitment is doing the thing you said you were going to do long after the mood you said it in has left you." -- Darren Hardy
|
||||
|
||||
Okay, that's it for today. In the next article of the mini series you will extend your calculator to handle more arithmetic expressions. Stay tuned.
|
||||
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://ruslanspivak.com/lsbasi-part1/
|
||||
|
||||
作者:[Ruslan Spivak][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://ruslanspivak.com
|
||||
[1]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_i_dont_know.png
|
||||
[2]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_omg.png
|
||||
[3]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_i_know.png
|
||||
[4]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_compiler_interpreter.png
|
||||
[5]:https://en.wikipedia.org/wiki/Pascal_%28programming_language%29
|
||||
[6]:https://docs.python.org/2/library/pdb.html
|
||||
[7]:https://github.com/rspivak/lsbasi/blob/master/part1/calc1.py
|
||||
[8]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer1.png
|
||||
[9]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer2.png
|
||||
[10]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer3.png
|
||||
[11]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer4.png
|
||||
[12]:https://ruslanspivak.com/lsbasi-part1/lsbasi_exercises2.png
|
||||
[13]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_commitment_pledge.png
|
||||
[14]:http://ruslanspivak.com/lsbaws-part1/ (Part 1)
|
||||
[15]:http://ruslanspivak.com/lsbaws-part2/ (Part 2)
|
||||
[16]:http://ruslanspivak.com/lsbaws-part3/ (Part 3)
|
@ -0,0 +1,334 @@
|
||||
让我们做个简单的解释器(1)
|
||||
======
|
||||
|
||||
|
||||
> **" If you don't know how compilers work, then you don't know how computers work. If you're not 100% sure whether you know how compilers work, then you don't know how they work."** -- Steve Yegge
|
||||
> **“如果你不知道编译器是怎么工作的,那你就不知道电脑是怎么工作的。如果你不能百分百确定,那就是不知道他们是如何工作的。”** --Steve Yegge
|
||||
|
||||
就是这样。想一想。你是萌新还是一个资深的软件开发者实际上都无关紧要:如果你不知道编译器和解释器是怎么工作的,那么你就不知道电脑是怎么工作的。就这么简单。
|
||||
|
||||
所以,你知道编译器和解释器是怎么工作的吗?我是说,你百分百确定自己知道他们怎么工作吗?如果不知道。![][1]
|
||||
|
||||
或者如果你不知道但你非常想要了解它。 ![][2]
|
||||
|
||||
不用担心。如果你能坚持跟着这个系列做下去,和我一起构建一个解释器和编译器,最后你将会知道他们是怎么工作的。并且你会变成一个自信满满的快乐的人。至少我希望如此。![][3]。
|
||||
|
||||
为什么要学习编译器和解释器?有三点理由。
|
||||
|
||||
1. 要写出一个解释器或编译器,你需要有很多的专业知识,并能融会贯通。写一个解释器或编译器能帮你加强这些能力,成为一个更厉害的软件开发者。而且,你要学的技能对写软件非常有用,而不是仅仅局限于解释器或编译器。
|
||||
2. 你确实想要了解电脑是怎么工作的。一般解释器和编译器看上去很魔幻。你或许不习惯这种魔力。你会想去揭开构建解释器和编译器那层神秘的面纱,了解他们的原理,把事情做好。
|
||||
3. 你想要创建自己的编程语言或者特定领域的语言。如果你创建了一个,你还要为它创建一个解释器或者编译器。最近,兴起了对新的编程语言的兴趣。你能看到几乎每天都有一门新的编程语言横空出世:Elixir,Go,Rust,还有很多。
|
||||
|
||||
|
||||
|
||||
|
||||
好,但什么是解释器和编译器?
|
||||
|
||||
**解释器** 和 **编译器** 的任务是把用高级语言写的源程序翻译成其他的格式。很奇怪,是不是?忍一忍,稍后你会在这个系列学到到底把源程序翻译成什么东西。
|
||||
|
||||
这时你可能会奇怪解释器和编译器之间有什么区别。为了实现这个系列的目的,我们规定一下,如果有个翻译器把源程序翻译成机器语言,那它就是 **编译器**。如果一个翻译器可以处理并执行源程序,却不用把它翻译器机器语言,那它就是 **解释器**。直观上它看起来像这样:
|
||||
|
||||
![][4]
|
||||
|
||||
我希望你现在确信你很想学习构建一个编译器和解释器。你期望在这个教程里学习解释器的哪些知识呢?
|
||||
|
||||
你看这样如何。你和我一起做一个简单的解释器当作 [Pascal][5] 语言的子集。在这个系列结束的时候你能做出一个可以运行的 Pascal 解释器和一个像 Python 的 [pdb][6] 那样的源代码级别的调试器。
|
||||
|
||||
你或许会问,为什么是 Pascal?有一点,它不是我为了这个系列而提出的一个虚构的语言:它是真实存在的一门编程语言,有很多重要的语言结构。有些陈旧但有用的计算机书籍使用 Pascal 编程语言作为示例(我知道对于选择一门语言来构建解释器,这个理由并不令人信服,但我认为学一门非主流的语言也不错:)。
|
||||
|
||||
这有个 Pascal 中的阶乘函数示例,你能用自己的解释器解释代码,还能够用可交互的源码级调试器进行调试,你可以这样创造:
|
||||
```
|
||||
program factorial;
|
||||
|
||||
function factorial(n: integer): longint;
|
||||
begin
|
||||
if n = 0 then
|
||||
factorial := 1
|
||||
else
|
||||
factorial := n * factorial(n - 1);
|
||||
end;
|
||||
|
||||
var
|
||||
n: integer;
|
||||
|
||||
begin
|
||||
for n := 0 to 16 do
|
||||
writeln(n, '! = ', factorial(n));
|
||||
end.
|
||||
```
|
||||
|
||||
这个 Pascal 解释器的实现语言会用 Python,但你也可以用其他任何语言,因为这里展示的思想不依赖任何特殊的实现语言。好,让我们开始干活。准备好了,出发!
|
||||
|
||||
你会从编写一个简单的算术表达式解析器,也就是常说的计算器,开始学习解释器和编译器。今天的目标非常简单:让你的计算器能处理两个个位数相加,比如 **3+5**。这是你的计算器的源代码,不好意思,是解释器:
|
||||
|
||||
|
||||
```
|
||||
# 标记类型
|
||||
#
|
||||
# EOF (end-of-file 文件末尾) 标记是用来表示所有输入都解析完成
|
||||
INTEGER, PLUS, EOF = 'INTEGER', 'PLUS', 'EOF'
|
||||
|
||||
|
||||
class Token(object):
|
||||
def __init__(self, type, value):
|
||||
# token 类型: INTEGER, PLUS, MINUS, or EOF
|
||||
self.type = type
|
||||
# token 值: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, '+', 或 None
|
||||
self.value = value
|
||||
|
||||
def __str__(self):
|
||||
"""String representation of the class instance.
|
||||
|
||||
Examples:
|
||||
Token(INTEGER, 3)
|
||||
Token(PLUS '+')
|
||||
"""
|
||||
return 'Token({type}, {value})'.format(
|
||||
type=self.type,
|
||||
value=repr(self.value)
|
||||
)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__str__()
|
||||
|
||||
|
||||
class Interpreter(object):
|
||||
def __init__(self, text):
|
||||
# 用户输入字符串, 例如 "3+5"
|
||||
self.text = text
|
||||
# self.pos 是 self.text 的索引
|
||||
self.pos = 0
|
||||
# 当前标记实例
|
||||
self.current_token = None
|
||||
|
||||
def error(self):
|
||||
raise Exception('Error parsing input')
|
||||
|
||||
def get_next_token(self):
|
||||
"""词法分析器(也说成扫描器或者标记器)
|
||||
|
||||
该方法负责把一个句子分成若干个标记。每次处理一个标记
|
||||
"""
|
||||
text = self.text
|
||||
|
||||
# self.pos 索引到达了 self.text 的末尾吗?
|
||||
# 如果到了,就返回 EOF 标记,因为没有更多的
|
||||
# 能转换成标记的输入了
|
||||
if self.pos > len(text) - 1:
|
||||
return Token(EOF, None)
|
||||
|
||||
# 从 self.pos 位置获取当前的字符,
|
||||
# 基于单个字符判断要生成哪种标记
|
||||
current_char = text[self.pos]
|
||||
# 如果字符是一个数字,就把他转换成一个整数,生成一个 INTEGER # 标记,累加 self.pos 索引,指向数字后面的下一个字符,
|
||||
# 并返回 INTEGER 标记
|
||||
if current_char.isdigit():
|
||||
token = Token(INTEGER, int(current_char))
|
||||
self.pos += 1
|
||||
return token
|
||||
|
||||
if current_char == '+':
|
||||
token = Token(PLUS, current_char)
|
||||
self.pos += 1
|
||||
return token
|
||||
|
||||
self.error()
|
||||
|
||||
def eat(self, token_type):
|
||||
# 将当前的标记类型与传入的标记类型作比较,如果他们相匹配,就
|
||||
# “eat” 掉当前的标记并将下一个标记赋给 self.current_token,
|
||||
# 否则抛出一个异常
|
||||
if self.current_token.type == token_type:
|
||||
self.current_token = self.get_next_token()
|
||||
else:
|
||||
self.error()
|
||||
|
||||
def expr(self):
|
||||
"""expr -> INTEGER PLUS INTEGER"""
|
||||
# 将输入中的第一个标记设置成当前标记
|
||||
self.current_token = self.get_next_token()
|
||||
|
||||
# 我们期望当前标记是个位数。
|
||||
left = self.current_token
|
||||
self.eat(INTEGER)
|
||||
|
||||
# 期望当前标记是 ‘+’ 号
|
||||
op = self.current_token
|
||||
self.eat(PLUS)
|
||||
|
||||
# 我们期望当前标记是个位数。
|
||||
right = self.current_token
|
||||
self.eat(INTEGER)
|
||||
|
||||
# 上述操作完成后,self.current_token 被设成 EOF 标记
|
||||
# 这时成功找到 INTEGER PLUS INTEGER 标记序列
|
||||
# 这个方法就可以返回两个整数相加的结果了,
|
||||
# 即高效的解释了用户输入
|
||||
result = left.value + right.value
|
||||
return result
|
||||
|
||||
|
||||
def main():
|
||||
while True:
|
||||
try:
|
||||
# 要在 Python3 下运行,请把 ‘raw_input’ 换成 ‘input’
|
||||
text = raw_input('calc> ')
|
||||
except EOFError:
|
||||
break
|
||||
if not text:
|
||||
continue
|
||||
interpreter = Interpreter(text)
|
||||
result = interpreter.expr()
|
||||
print(result)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
```
|
||||
|
||||
|
||||
把上面的代码保存到 calc1.py 文件,或者直接从 [GitHub][7] 上下载。在你深入研究代码前,在命令行里面运行它看看效果。试一试!这是我笔记本上的示例会话(如果你想在 Python3 下运行,你要把 raw_input 换成 input):
|
||||
```
|
||||
$ python calc1.py
|
||||
calc> 3+4
|
||||
7
|
||||
calc> 3+5
|
||||
8
|
||||
calc> 3+9
|
||||
12
|
||||
calc>
|
||||
```
|
||||
|
||||
要让你的简易计算器正常工作,不抛出异常,你的输入要遵守以下几个规则:
|
||||
|
||||
* 只允许输入个位数
|
||||
* 此时支持的唯一一个运算符是加法
|
||||
* 输入中不允许有任何的空格符号
|
||||
|
||||
|
||||
|
||||
要让计算器变得简单,这些限制非常必要。不用担心,你很快就会让它变得很复杂。
|
||||
|
||||
好,现在让我们深入它,看看解释器是怎么工作,它是怎么评估出算术表达式的。
|
||||
|
||||
当你在命令行中输入一个表达式 3+5,解释器就获得了字符串 “3+5”。为了让解释器能够真正理解要用这个字符串做什么,它首先要把输入 “3+5” 分到叫做 **token(标记)** 的容器里。**标记** 是一个拥有类型和值的对象。比如说,对字符 “3” 而言,标记的类型是 INTEGER 整数,对应的值是 3。
|
||||
|
||||
把输入字符串分成标记的过程叫 **词法分析**。因此解释器的需要做的第一步是读取输入字符,并将其转换成标记流。解释器中的这一部分叫做 **词法分析器**,或者简短点叫 **lexer**。你也可以给它起别的名字,诸如 **扫描器** 或者 **标记器**。他们指的都是同一个东西:解释器或编译器中将输入字符转换成标记流的那部分。
|
||||
|
||||
Interpreter 类中的 get_next_token 方法就是词法分析器。每次调用它的时候,你都能从传入解释器的输入字符中获得创建的下一个标记。仔细看看这个方法,看看它是如何完成把字符转换成标记的任务的。输入被存在可变文本中,它保存了输入的字符串和关于该字符串的索引(把字符串想象成字符数组)。pos 开始时设为 0,指向 ‘3’.这个方法一开始检查字符是不是数字,如果是,就将 pos 加 1,并返回一个 INTEGER 类型的标记实例,并把字符 ‘3’ 的值设为整数,也就是整数 3:
|
||||
|
||||
![][8]
|
||||
|
||||
现在 pos 指向文本中的 ‘+’ 号。下次调用这个方法的时候,它会测试 pos 位置的字符是不是个数字,然后检测下一个字符是不是个加号,就是这样。结果这个方法把 pos 加一,返回一个新创建的标记,类型是 PLUS,值为 ‘+’。
|
||||
|
||||
![][9]
|
||||
|
||||
pos 现在指向字符 ‘5’。当你再调用 get_next_token 方法时,该方法会检查这是不是个数字,就是这样,然后它把 pos 加一,返回一个新的 INTEGER 标记,该标记的值被设为 5:
|
||||
![][10]
|
||||
|
||||
因为 pos 索引现在到了字符串 “3+5” 的末尾,你每次调用 get_next_token 方法时,它将会返回 EOF 标记:
|
||||
![][11]
|
||||
|
||||
自己试一试,看看计算器里的词法分析器的运行:
|
||||
```
|
||||
>>> from calc1 import Interpreter
|
||||
>>>
|
||||
>>> interpreter = Interpreter('3+5')
|
||||
>>> interpreter.get_next_token()
|
||||
Token(INTEGER, 3)
|
||||
>>>
|
||||
>>> interpreter.get_next_token()
|
||||
Token(PLUS, '+')
|
||||
>>>
|
||||
>>> interpreter.get_next_token()
|
||||
Token(INTEGER, 5)
|
||||
>>>
|
||||
>>> interpreter.get_next_token()
|
||||
Token(EOF, None)
|
||||
>>>
|
||||
```
|
||||
|
||||
既然你的解释器能够从输入字符中获取标记流,解释器需要做点什么:它需要在词法分析器 get_next_token 中获取的标记流中找出相应的结构。你的解释器应该能够找到流中的结构:INTEGER -> PLUS -> INTEGER。就是这样,它尝试找出标记的序列:整数后面要跟着加号,加号后面要跟着整数。
|
||||
|
||||
负责找出并解释结构的方法就是 expr。该方法检验标记序列确实与期望的标记序列是对应的,比如 INTEGER -> PLUS -> INTEGER。成功确认了这个结构后,就会生成加号左右两边的标记的值相加的结果,这样就成功解释你输入到解释器中的算术表达式了。
|
||||
|
||||
expr 方法用了一个助手方法 eat 来检验传入的标记类型是否与当前的标记类型相匹配。在匹配到传入的标记类型后,eat 方法获取下一个标记,并将其赋给 current_token 变量,然后高效地 “吃掉” 当前匹配的标记,并将标记流的虚拟指针向后移动。如果标记流的结构与期望的 INTEGER PLUS INTEGER 标记序列不对应,eat 方法就抛出一个异常。
|
||||
|
||||
让我们回顾下解释器做了什么来对算术表达式进行评估的:
|
||||
|
||||
* 解释器接受输入字符串,就把它当作 “3+5”
|
||||
* 解释器调用 expr 方法,在词法分析器 get_next_token 返回的标记流中找出结构。这个结构就是 INTEGER PLUS INTEGER 这样的格式。在确认了格式后,它就通过把两个整型标记相加解释输入,因为此时对于解释器来说很清楚,他要做的就是把两个整数 3 和 5 进行相加。
|
||||
|
||||
|
||||
恭喜。你刚刚学习了怎么构建自己的第一个解释器!
|
||||
|
||||
现在是时候做练习了。
|
||||
|
||||
![][12]
|
||||
|
||||
看了这篇文章,你肯定觉得不够,是吗?好,准备好做这些练习:
|
||||
|
||||
1. 修改代码,允许输入多位数,比如 “12+3”
|
||||
2. 添加一个方法忽略空格符,让你的计算器能够处理带有空白的输入,比如“12 + 3”
|
||||
3. 修改代码,用 ‘-’ 号而非 ‘+’ 号去执行减法比如 “7-5”
|
||||
|
||||
|
||||
**检验你的理解**
|
||||
|
||||
1. 什么是解释器?
|
||||
2. 什么是编译器
|
||||
3. 解释器和编译器有什么差别?
|
||||
4. 什么是标记?
|
||||
5. 将输入分隔成若干个标记的过程叫什么?
|
||||
6. 解释器中进行词法分析的部分叫什么?
|
||||
7. 解释器或编译器中进行词法分析的部分有哪些其他的常见名字?
|
||||
|
||||
|
||||
|
||||
在结束本文前,我衷心希望你能留下学习解释器和编译器的承诺。并且现在就开始做。不要把它留到以后。不要拖延。如果你已经看完了本文,就开始吧。如果已经仔细看完了但是还没做什么练习 —— 现在就开始做吧。如果已经开始做练习了,那就把剩下的做完。你懂得。而且你知道吗?签下承诺书,今天就开始学习解释器和编译器!
|
||||
|
||||
|
||||
_本人, ______,身体健全,思想正常,在此承诺从今天开始学习解释器和编译器,直到我百分百了解它们是怎么工作的!_
|
||||
|
||||
签字人:
|
||||
|
||||
日期:
|
||||
|
||||
![][13]
|
||||
|
||||
签字,写上日期,把它放在你每天都能看到的地方,确保你能坚守承诺。谨记你的承诺:
|
||||
|
||||
> "Commitment is doing the thing you said you were going to do long after the mood you said it in has left you." -- Darren Hardy
|
||||
> “承诺就是,你说自己会去做的事,在你说完就一直陪着你的东西。” —— Darren Hardy
|
||||
|
||||
好,今天的就结束了。这个系列的下一篇文章里,你将会扩展自己的计算器,让它能够处理更复杂的算术表达式。敬请期待。
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://ruslanspivak.com/lsbasi-part1/
|
||||
|
||||
|
||||
作者:[Ruslan Spivak][a]
|
||||
译者:[BriFuture](https://github.com/BriFuture)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://ruslanspivak.com
|
||||
[1]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_i_dont_know.png
|
||||
[2]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_omg.png
|
||||
[3]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_i_know.png
|
||||
[4]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_compiler_interpreter.png
|
||||
[5]:https://en.wikipedia.org/wiki/Pascal_%28programming_language%29
|
||||
[6]:https://docs.python.org/2/library/pdb.html
|
||||
[7]:https://github.com/rspivak/lsbasi/blob/master/part1/calc1.py
|
||||
[8]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer1.png
|
||||
[9]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer2.png
|
||||
[10]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer3.png
|
||||
[11]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer4.png
|
||||
[12]:https://ruslanspivak.com/lsbasi-part1/lsbasi_exercises2.png
|
||||
[13]:https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_commitment_pledge.png
|
||||
[14]:http://ruslanspivak.com/lsbaws-part1/ (Part 1)
|
||||
[15]:http://ruslanspivak.com/lsbaws-part2/ (Part 2)
|
||||
[16]:http://ruslanspivak.com/lsbaws-part3/ (Part 3)
|
Loading…
Reference in New Issue
Block a user