Merge pull request #8938 from qhwdw/tr515

Translated by qhwdw
This commit is contained in:
Xingyu.Wang 2018-05-27 16:16:31 +08:00 committed by GitHub
commit 18f491892c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 155 additions and 156 deletions

View File

@ -1,156 +0,0 @@
Translating by qhwdw
An introduction to Python bytecode
======
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/code_computer_development_programming.png?itok=4OM29-82)
If you've ever written, or even just used, Python, you're probably used to seeing Python source code files; they have names ending in `.py`. And you may also have seen another type of file, with a name ending in `.pyc`, and you may have heard that they're Python "bytecode" files. (These are a bit harder to see on Python 3—instead of ending up in the same directory as your `.py` files, they go into a subdirectory called `__pycache__`.) And maybe you've heard that this is some kind of time-saver that prevents Python from having to re-parse your source code every time it runs.
But beyond "oh, that's Python bytecode," do you really know what's in those files and how Python uses them?
If not, today's your lucky day! I'll take you through what Python bytecode is, how Python uses it to execute your code, and how knowing about it can help you.
### How Python works
Python is often described as an interpreted language—one in which your source code is translated into native CPU instructions as the program runs—but this is only partially correct. Python, like many interpreted languages, actually compiles source code to a set of instructions for a virtual machine, and the Python interpreter is an implementation of that virtual machine. This intermediate format is called "bytecode."
So those `.pyc` files Python leaves lying around aren't just some "faster" or "optimized" version of your source code; they're the bytecode instructions that will be executed by Python's virtual machine as your program runs.
Let's look at an example. Here's a classic "Hello, World!" written in Python:
```
def hello()
    print("Hello, World!")
```
And here's the bytecode it turns into (translated into a human-readable form):
```
2           0 LOAD_GLOBAL              0 (print)
            2 LOAD_CONST               1 ('Hello, World!')
            4 CALL_FUNCTION            1
```
If you type up that `hello()` function and use the [CPython][1] interpreter to run it, the above listing is what Python will execute. It might look a little weird, though, so let's take a deeper look at what's going on.
### Inside the Python virtual machine
CPython uses a stack-based virtual machine. That is, it's oriented entirely around stack data structures (where you can "push" an item onto the "top" of the structure, or "pop" an item off the "top").
CPython uses three types of stacks:
1. The **call stack**. This is the main structure of a running Python program. It has one item—a "frame"—for each currently active function call, with the bottom of the stack being the entry point of the program. Every function call pushes a new frame onto the call stack, and every time a function call returns, its frame is popped off.
2. In each frame, there's an **evaluation stack** (also called the **data stack** ). This stack is where execution of a Python function occurs, and executing Python code consists mostly of pushing things onto this stack, manipulating them, and popping them back off.
3. Also in each frame, there's a **block stack**. This is used by Python to keep track of certain types of control structures: loops, `try`/`except` blocks, and `with` blocks all cause entries to be pushed onto the block stack, and the block stack gets popped whenever you exit one of those structures. This helps Python know which blocks are active at any given moment so that, for example, a `continue` or `break` statement can affect the correct block.
Most of Python's bytecode instructions manipulate the evaluation stack of the current call-stack frame, although there are some instructions that do other things (like jump to specific instructions or manipulate the block stack).
To get a feel for this, suppose we have some code that calls a function, like this: `my_function(my_variable, 2)`. Python will translate this into a sequence of four bytecode instructions:
1. A `LOAD_NAME` instruction that looks up the function object `my_function` and pushes it onto the top of the evaluation stack
2. Another `LOAD_NAME` instruction to look up the variable `my_variable` and push it on top of the evaluation stack
3. A `LOAD_CONST` instruction to push the literal integer value `2` on top of the evaluation stack
4. A `CALL_FUNCTION` instruction
The `CALL_FUNCTION` instruction will have an argument of 2, which indicates that Python needs to pop two positional arguments off the top of the stack; then the function to call will be on top, and it can be popped as well (for functions involving keyword arguments, a different instruction—`CALL_FUNCTION_KW`—is used, but with a similar principle of operation, and a third instruction, `CALL_FUNCTION_EX`, is used for function calls that involve argument unpacking with the `*` or `**` operators). Once Python has all that, it will allocate a new frame on the call stack, populate the local variables for the function call, and execute the bytecode of `my_function` inside that frame. Once that's done, the frame will be popped off the call stack, and in the original frame the return value of `my_function` will be pushed on top of the evaluation stack.
### Accessing and understanding Python bytecode
If you want to play around with this, the `dis` module in the Python standard library is a huge help; the `dis` module provides a "disassembler" for Python bytecode, making it easy to get a human-readable version and look up the various bytecode instructions. [The documentation for the `dis` module][2] goes over its contents and provides a full list of bytecode instructions along with what they do and what arguments they take.
For example, to get the bytecode listing for the `hello()` function above, I typed it into a Python interpreter, then ran:
```
import dis
dis.dis(hello)
```
The function `dis.dis()` will disassemble a function, method, class, module, compiled Python code object, or string literal containing source code and print a human-readable version. Another handy function in the `dis` module is `distb()`. You can pass it a Python traceback object or call it after an exception has been raised, and it will disassemble the topmost function on the call stack at the time of the exception, print its bytecode, and insert a pointer to the instruction that raised the exception.
It's also useful to look at the compiled code objects Python builds for every function since executing a function makes use of attributes of those code objects. Here's an example looking at the `hello()` function:
```
>>> hello.__code__
<code object hello at 0x104e46930, file "<stdin>", line 1>
>>> hello.__code__.co_consts
(None, 'Hello, World!')
>>> hello.__code__.co_varnames
()
>>> hello.__code__.co_names
('print',)
```
The code object is accessible as the attribute `__code__` on the function and carries a few important attributes:
* `co_consts` is a tuple of any literals that occur in the function body
* `co_varnames` is a tuple containing the names of any local variables used in the function body
* `co_names` is a tuple of any non-local names referenced in the function body
Many bytecode instructions—particularly those that load values to be pushed onto the stack or store values in variables and attributes—use indices in these tuples as their arguments.
So now we can understand the bytecode listing of the `hello()` function:
1. `LOAD_GLOBAL 0`: tells Python to look up the global object referenced by the name at index 0 of `co_names` (which is the `print` function) and push it onto the evaluation stack
2. `LOAD_CONST 1`: takes the literal value at index 1 of `co_consts` and pushes it (the value at index 0 is the literal `None`, which is present in `co_consts` because Python function calls have an implicit return value of `None` if no explicit `return` statement is reached)
3. `CALL_FUNCTION 1`: tells Python to call a function; it will need to pop one positional argument off the stack, then the new top-of-stack will be the function to call.
The "raw" bytecode—as non-human-readable bytes—is also available on the code object as the attribute `co_code`. You can use the list `dis.opname` to look up the names of bytecode instructions from their decimal byte values if you'd like to try to manually disassemble a function.
### Putting bytecode to use
Now that you've read this far, you might be thinking "OK, I guess that's cool, but what's the practical value of knowing this?" Setting aside curiosity for curiosity's sake, understanding Python bytecode is useful in a few ways.
First, understanding Python's execution model helps you reason about your code. People like to joke about C being a kind of "portable assembler," where you can make good guesses about what machine instructions a particular chunk of C source code will turn into. Understanding bytecode will give you the same ability with Python—if you can anticipate what bytecode your Python source code turns into, you can make better decisions about how to write and optimize it.
Second, understanding bytecode is a useful way to answer questions about Python. For example, I often see newer Python programmers wondering why certain constructs are faster than others (like why `{}` is faster than `dict()`). Knowing how to access and read Python bytecode lets you work out the answers (try it: `dis.dis("{}")` versus `dis.dis("dict()")`).
Finally, understanding bytecode and how Python executes it gives a useful perspective on a particular kind of programming that Python programmers don't often engage in: stack-oriented programming. If you've ever used a stack-oriented language like FORTH or Factor, this may be old news, but if you're not familiar with this approach, learning about Python bytecode and understanding how its stack-oriented programming model works is a neat way to broaden your programming knowledge.
### Further reading
If you'd like to learn more about Python bytecode, the Python virtual machine, and how they work, I recommend these resources:
* [Inside the Python Virtual Machine][3] by Obi Ike-Nwosu is a free online book that does a deep dive into the Python interpreter, explaining in detail how Python actually works.
* [A Python Interpreter Written in Python][4] by Allison Kaptur is a tutorial for building a Python bytecode interpreter in—what else—Python itself, and it implements all the machinery to run Python bytecode.
* Finally, the CPython interpreter is open source and you can [read through it on GitHub][1]. The implementation of the bytecode interpreter is in the file `Python/ceval.c`. [Here's that file for the Python 3.6.4 release][5]; the bytecode instructions are handled by the `switch` statement beginning on line 1266.
To learn more, attend James Bennett's talk, [A Bit about Bytes: Understanding Python Bytecode][6], at [PyCon Cleveland 2018][7].
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/4/introduction-python-bytecode
作者:[James Bennett][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/ubernostrum
[1]:https://github.com/python/cpython
[2]:https://docs.python.org/3/library/dis.html
[3]:https://leanpub.com/insidethepythonvirtualmachine
[4]:http://www.aosabook.org/en/500L/a-python-interpreter-written-in-python.html
[5]:https://github.com/python/cpython/blob/d48ecebad5ac78a1783e09b0d32c211d9754edf4/Python/ceval.c
[6]:https://us.pycon.org/2018/schedule/presentation/127/
[7]:https://us.pycon.org/2018/

View File

@ -0,0 +1,155 @@
Python 字节码介绍
======
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/code_computer_development_programming.png?itok=4OM29-82)
如果你从没有写过 Python或者甚至只是使用过 Python你或许已经习惯于看 Python 源代码文件;它们的名字以 `.py` 结尾。你可能还看到过其它类型的文件,比如使用 `.pyc` 结尾的文件,或许你可能听说过,它们就是 Python 的 "字节码" 文件。(在 Python 3 上这些可能不容易看到 — 因为它们与你的 `.py` 文件不在同一个目录下,它们在一个叫 `__pycache__` 的子目录中)或者你也听说过,这是节省时间的一种方法,它可以避免每次运行 Python 时去重新解析源代码。
但是,除了 “噢,原来这就是 Python 字节码” 之外,你还知道这些文件能做什么吗?以及 Python 是如何使用它们的?
如果你不知道,那你走运了!今天我将带你了解 Python 的字节码是什么Python 如何使用它去运行你的代码,以及知道它是如何帮助你的。
### Python 如何工作
Python 经常被介绍为它是一个解释型语言 — 其中一个原因是程序运行时,你的源代码被转换成 CPU 的原生指令 — 但这样认为只是部分正确。Python 与大多数解释型语言一样,确实是将源代码编译为一组虚拟机指令,并且 Python 解释器是针对相应的虚拟机实现的。这种中间格式被称为 “字节码”。
因此,这些 `.pyc` 文件是 Python 悄悄留下的,是为了让它们运行的 “更快”,或者是针对你的源代码的 “优化” 版本;它们是你的程序在 Python 虚拟机上运行的字节码指令。
我们来看一个示例。这里是用 Python 写的经典程序 "Hello, World!"
```
def hello()
    print("Hello, World!")
```
下面是转换后的字节码(转换为人类可读的格式):
```
2           0 LOAD_GLOBAL              0 (print)
            2 LOAD_CONST               1 ('Hello, World!')
            4 CALL_FUNCTION            1
```
如果你输入那个 `hello()` 函数,然后使用 [CPython][1] 解释器去运行它,上面的 Python 程序将会运行。它看起来可能有点奇怪,因此,我们来深入了解一下它都做了些什么。
### Python 虚拟机内幕
CPython 使用一个基于栈的虚拟机。也就是说,它完全面向栈数据结构的(你可以 “推入” 一个东西到栈 “顶”,或者,从栈 “顶” 上 “弹出” 一个东西来)。
CPython 使用三种类型的栈:
1. **调用栈**。这是运行 Python 程序的主要结构。它为每个当前活动的函数调用使用了一个东西 — "帧“,栈底是程序的入口点。每个函数调用推送一个新帧到调用栈,每当函数调用返回后,这个帧被销毁。
2. 在每个帧中,有一个 **计算栈** (也称为 **数据栈**)。这个栈就是 Python 函数运行的地方,运行的 Python 代码大多数是由推入到这个栈中的东西组成的,操作它们,然后在返回后销毁它们。
3. 在每个帧中,还有一个 **块栈**。它被 Python 用于去跟踪某些类型的控制结构loops、`try`/`except` 块、以及 `with` 块,全部推入到块栈中,当你退出这些控制结构时,块栈被销毁。这将帮助 Python 了解任意给定时刻哪个块是活动的,比如,一个 `continue` 或者 `break` 语句可能影响正确的块。
大多数 Python 字节码指令操作的是当前调用栈帧的计算栈,虽然,还有一些指令可以做其它的事情(比如跳转到指定指令,或者操作块栈)。
为了更好地理解,假设我们有一些调用函数的代码,比如这个:`my_function(my_variable, 2)`。Python 将转换为一系列字节码指令:
1. 一个 `LOAD_NAME` 指令去查找函数对象 `my_function`,然后将它推入到计算栈的顶部
2. 另一个 `LOAD_NAME` 指令去查找变量 `my_variable`,然后将它推入到计算栈的顶部
3. 一个 `LOAD_CONST` 指令去推入一个实整数值 `2` 到计算栈的顶部
4. 一个 `CALL_FUNCTION` 指令
这个 `CALL_FUNCTION` 指令将有 2 个参数,它表示那个 Python 需要从栈顶弹出两个位置参数;然后函数将在它上面进行调用,并且它也同时被弹出(对于函数涉及的关键字参数,它使用另一个不同的指令 — `CALL_FUNCTION_KW`,但使用的操作原则类似,以及第三个指令 — `CALL_FUNCTION_EX`,它适用于函数调用涉及到使用 `*``**` 操作符的情况)。一旦 Python 拥有了这些之后,它将在调用栈上分配一个新帧,填充到函数调用的本地变量上,然后,运行那个帧内的 `my_function` 字节码。运行完成后,这个帧将被调用栈销毁,最初的帧内返回的 `my_function` 将被推入到计算栈的顶部。
### 访问和理解 Python 字节码
如果你想玩转字节码那么Python 标准库中的 `dis` 模块将对你有非常大的帮助;`dis` 模块为 Python 字节码提供了一个 "反汇编",它可以让你更容易地得到一个人类可读的版本,以及查找各种字节码指令。[`dis` 模块的文档][2] 可以让你遍历它的内容,并且提供一个字节码指令能够做什么和有什么样的参数的完整清单。
例如,获取上面的 `hello()` 函数的列表,可以在一个 Python 解析器中输入如下内容,然后运行它:
```
import dis
dis.dis(hello)
```
函数 `dis.dis()` 将反汇编一个函数、方法、类、模块、编译过的 Python 代码对象、或者字符串包含的源代码,以及显示出一个人类可读的版本。`dis` 模块中另一个方便的功能是 `distb()`。你可以给它传递一个 Python 追溯对象,或者发生预期外情况时调用它,然后它将反汇编发生预期外情况时在调用栈上最顶端的函数,并显示它的字节码,以及插入一个指向到引发意外情况的指令的指针。
它也可以用于查看 Python 为每个函数构建的编译后的代码对象,因为运行一个函数将会用到这些代码对象的属性。这里有一个查看 `hello()` 函数的示例:
```
>>> hello.__code__
<code object hello at 0x104e46930, file "<stdin>", line 1>
>>> hello.__code__.co_consts
(None, 'Hello, World!')
>>> hello.__code__.co_varnames
()
>>> hello.__code__.co_names
('print',)
```
代码对象在函数中可以作为属性 `__code__` 来访问,并且携带了一些重要的属性:
* `co_consts` 是存在于函数体内的任意实数的元组
* `co_varnames` 是函数体内使用的包含任意本地变量名字的元组
* `co_names` 是在函数体内引用的任意非本地名字的元组
许多字节码指令 — 尤其是那些推入到栈中的加载值,或者在变量和属性中的存储值 — 在这些用作它们参数的元组中使用索引。
因此,现在我们能够理解 `hello()` 函数中所列出的字节码:
1. `LOAD_GLOBAL 0`:告诉 Python 通过 `co_names` (它是 `print` 函数)的索引 0 上的名字去查找它指向的全局对象,然后将它推入到计算栈
2. `LOAD_CONST 1`:带入 `co_consts` 在索引 1 上的实数值,并将它推入(索引 0 上的实数值是 `None`,它表示在 `co_consts` 中,因为 Python 函数调用有一个隐式的返回值 `None`,如果没有显式的返回表达式,就返回这个隐式的值 )。
3. `CALL_FUNCTION 1`:告诉 Python 去调用一个函数;它需要从栈中弹出一个位置参数,然后,新的栈顶将被函数调用。
"原始的" 字节码 — 是非人类可读格式的字节 — 也可以在代码对象上作为 `co_code` 属性可用。如果你有兴趣尝试手工反汇编一个函数时,你可以从它们的十进制字节值中,使用列出 `dis.opname` 的方式去查看字节码指令的名字。
### 字节码的用处
现在,你已经了解的足够多了,你可能会想 ” OK我认为它很酷但是知道这些有什么实际价值呢“由于对它很好奇我们去了解它但是除了好奇之外Python 字节码在几个方面还是非常有用的。
首先,理解 Python 的运行模型可以帮你更好地理解你的代码。人们都开玩笑说C 将成为一个 ”便携式汇编器“,在那里你可以很好地猜测出一段 C 代码转换成什么样的机器指令。理解 Python 字节码之后,你在使用 Python 时也具备同样的能力 — 如果你能预料到你的 Python 源代码将被转换成什么样的字节码,那么你可以知道如何更好地写和优化 Python 源代码。
第二,理解字节码可以帮你更好地回答有关 Python 的问题。比如,我经常看到一些 Python 新手困惑为什么某些结构比其它结构运行的更快(比如,为什么 `{}``dict()` 快)。知道如何去访问和阅读 Python 字节码将让你很容易回答这样的问题(尝试对比一下: `dis.dis("{}")``dis.dis("dict()")` 就会明白)。
最后,理解字节码和 Python 如何运行它,为 Python 程序员不经常使用的一种特定的编程方式提供了有用的视角:面向栈的编程。如果你以前从来没有使用过像 FORTH 或 Fator 这样的面向栈的编程语言,它们可能有些古老,但是,如果你不熟悉这种方法,学习有关 Python 字节码的知识,以及理解面向栈的编程模型是如何工作的,将有助你开拓你的编程视野。
### 延伸阅读
如果你想进一步了解有关 Python 字节码、Python 虚拟机、以及它们是如何工作的更多知识,我推荐如下的这些资源:
* [Python 虚拟机内幕][3],它是 Obi Ike-Nwosu 写的一本免费在线电子书,它深入 Python 解析器,解释了 Python 如何工作的细节。
* [一个用 Python 编写的 Python 解析器][4],它是由 Allison Kaptur 写的一个教程,它是用 Python 构建的 Python 字节码解析器,并且它实现了运行 Python 字节码的全部构件。
* 最后CPython 解析器是一个开源软件,你可以在 [GitHub][1] 上阅读它。它在文件 `Python/ceval.c` 中实现了字节码解析器。[这是 Python 3.6.4 发行版中那个文件的链接][5];字节码指令是由第 1266 行开始的 `switch` 语句来处理的。
学习更多内容,参与到 James Bennett 的演讲,[有关字节的知识:理解 Python 字节码][6],将在 [PyCon Cleveland 2018][7] 召开。
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/4/introduction-python-bytecode
作者:[James Bennett][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[qhwdw](https://github.com/qhwdw)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/ubernostrum
[1]:https://github.com/python/cpython
[2]:https://docs.python.org/3/library/dis.html
[3]:https://leanpub.com/insidethepythonvirtualmachine
[4]:http://www.aosabook.org/en/500L/a-python-interpreter-written-in-python.html
[5]:https://github.com/python/cpython/blob/d48ecebad5ac78a1783e09b0d32c211d9754edf4/Python/ceval.c
[6]:https://us.pycon.org/2018/schedule/presentation/127/
[7]:https://us.pycon.org/2018/