Update and rename sources/tech/20190813 Building a non-breaking breakpoint for Python debugging.md to translated/tech/20190813 Building a non-breaking breakpoint for Python debugging.md

This commit is contained in:
caiichenr 2020-03-07 01:18:03 +08:00 committed by GitHub
parent d1d29be14d
commit 3a7c5ddebd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 237 additions and 238 deletions

View File

@ -1,238 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (caiichenr)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Building a non-breaking breakpoint for Python debugging)
[#]: via: (https://opensource.com/article/19/8/debug-python)
[#]: author: (Liran Haimovitch https://opensource.com/users/liranhaimovitch)
Building a non-breaking breakpoint for Python debugging
======
Have you ever wondered how to speed up a debugger? Here are some lessons
learned while building one for Python.
![Real python in the graphic jungle][1]
This is the story of how our team at [Rookout][2] built non-breaking breakpoints for Python and some of the lessons we learned along the way. I'll be presenting all about the nuts and bolts of debugging in Python at [PyBay 2019][3] in San Francisco this month. Let's dig in.
### The heart of Python debugging: sys.set_trace
There are many Python debuggers out there. Some of the more popular include:
* **pdb**, part of the Python standard library
* **PyDev**, the debugger behind the Eclipse and PyCharm IDEs
* **ipdb**, the IPython debugger
Despite the range of choices, almost every Python debugger is based on just one function: **sys.set_trace**. And let me tell you, **[sys.settrace][4]** might just be the most complex function in the Python standard library.
![set_trace Python 2 docs page][5]
In simpler terms, **settrace** registers a trace function for the interpreter, which may be called in any of the following cases:
* Function call
* Line execution
* Function return
* Exception raised
A simple trace function might look like this:
```
def simple_tracer(frame, event, arg):
  co = frame.f_code
  func_name = co.co_name
  line_no = frame.f_lineno
  print("{e} {f} {l}".format(
e=event, f=func_name, l=line_no))
  return simple_tracer
```
When looking at this function, the first things that come to mind are its arguments and return values. The trace function arguments are:
* **frame** object, which is the full state of the interpreter at the point of the function's execution
* **event** string, which can be **call**, **line**, **return**, or **exception**
* **arg** object, which is optional and depends on the event type
The trace function returns itself because the interpreter keeps track of two kinds of trace functions:
* **Global trace function (per thread):** This trace function is set for the current thread by **sys.settrace** and is invoked whenever a new **frame** is created by the interpreter (essentially on every function call). While there's no documented way to set the trace function for a different thread, you can call **threading.settrace** to set the trace function for all newly created **threading** module threads.
* **Local trace function (per frame):** This trace function is set by the interpreter to the value returned by the global trace function upon frame creation. There's no documented way to set the local trace function once the frame has been created.
This mechanism is designed to allow the debugger to have more granular control over which frames are traced to reduce performance impact.
### Building our debugger in three easy steps (or so we thought)
With all that background, writing your own debugger using a custom trace function looks like a daunting task. Luckily, **pdb**, the standard Python debugger, is built on top of **Bdb**, a base class for building debuggers.
A naive breakpoints debugger based on **Bdb** might look like this:
```
import bdb
import inspect
class Debugger(bdb.Bdb):
  def __init__(self):
      Bdb.__init__(self)
      self.breakpoints = dict()
      self.set_trace()
def set_breakpoint(self, filename, lineno, method):
  self.set_break(filename, lineno)
  try :
      self.breakpoints[(filename, lineno)].add(method)
  except KeyError:
      self.breakpoints[(filename, lineno)] = [method]
def user_line(self, frame):
  if not self.break_here(frame):
      return
  # Get filename and lineno from frame
  (filename, lineno, _, _, _) = inspect.getframeinfo(frame)
  methods = self.breakpoints[(filename, lineno)]
  for method in methods:
      method(frame)
```
All this does is:
1. Inherits from **Bdb** and write a simple constructor initializing the base class and tracing.
2. Adds a **set_breakpoint** method that uses **Bdb** to set the breakpoint and keeps track of our breakpoints.
3. Overrides the **user_line** method that is called by **Bdb** on certain user lines. The function makes sure it is being called for a breakpoint, gets the source location, and invokes the registered breakpoints
### How well did the simple Bdb debugger work?
Rookout is about bringing a debugger-like user experience to production-grade performance and use cases. So, how well did our naive breakpoint debugger perform?
To test it and measure the global performance overhead, we wrote two simple test methods and executed each of them 16 million times under multiple scenarios. Keep in mind that no breakpoint was executed in any of the cases.
```
def empty_method():
   pass
def simple_method():
   a = 1
   b = 2
   c = 3
   d = 4
   e = 5
   f = 6
   g = 7
   h = 8
   i = 9
   j = 10
```
Using the debugger takes a shocking amount of time to complete. The bad results make it clear that our naive **Bdb** debugger is not yet production-ready.
![First Bdb debugger results][6]
### Optimizing the debugger
There are three main ways to reduce debugger overhead:
1. **Limit local tracing as much as possible:** Local tracing is very costly compared to global tracing due to the much larger number of events per line of code.
2. **Optimize "call" events and return control to the interpreter faster:** The main work in **call** events is deciding whether or not to trace.
3. **Optimize "line" events and return control to the interpreter faster:** The main work in **line** events is deciding whether or not we hit a breakpoint.
So we forked **Bdb**, reduced the feature set, simplified the code, optimized for hot code paths, and got impressive results. However, we were still not satisfied. So, we took another stab at it, migrated and optimized our code to **.pyx**, and compiled it using [Cython][7]. The final results (as you can see below) were still not good enough. So, we ended up diving into CPython's source code and realizing we could not make tracing fast enough for production use.
![Second Bdb debugger results][8]
### Rejecting Bdb in favor of bytecode manipulation
After our initial disappointment from the trial-and-error cycles of standard debugging methods, we decided to look into a less obvious option: bytecode manipulation.
The Python interpreter works in two main stages:
1. **Compiling Python source code into Python bytecode:** This unreadable (for humans) format is optimized for efficient execution and is often cached in those **.pyc** files we have all come to love.
2. **Iterating through the bytecode in the _interpreter loop_:** This executes one instruction at a time.
This is the pattern we chose: use **bytecode manipulation** to set **non-breaking breakpoints** with no global overhead. This is done by finding the bytecode in memory that represents the source line we are interested in and inserting a function call just before the relevant instruction. This way, the interpreter does not have to do any extra work to support our breakpoints.
This approach is not magic. Here's a quick example.
We start with a very simple function:
```
def multiply(a, b):
   result = a * b
   return result
```
In documentation hidden in the **[inspect][9]** module (which has several useful utilities), we learn we can get the function's bytecode by accessing **multiply.func_code.co_code**:
```
`'|\x00\x00|\x01\x00\x14}\x02\x00|\x02\x00S'`
```
This unreadable string can be improved using the **[dis][10]** module in the Python standard library. By calling **dis.dis(multiply.func_code.co_code)**, we get:
```
  4          0 LOAD_FAST               0 (a)
             3 LOAD_FAST               1 (b)
             6 BINARY_MULTIPLY    
             7 STORE_FAST              2 (result)
  5         10 LOAD_FAST               2 (result)
            13 RETURN_VALUE      
```
This gets us closer to understanding what happens behind the scenes of debugging but not to a straightforward solution. Unfortunately, Python does not offer a method for changing a function's bytecode from within the interpreter. You can overwrite the function object, but that's not good enough for the majority of real-world debugging scenarios. You have to go about it in a roundabout way using a native extension.
### Conclusion
When building a new tool, you invariably end up learning a lot about how stuff works. It also makes you think out of the box and keep your mind open to unexpected solutions.
Working on non-breaking breakpoints for Rookout has taught me a lot about compilers, debuggers, server frameworks, concurrency models, and much much more. If you are interested in learning more about bytecode manipulation, Google's open source **[cloud-debug-python][11]** has tools for editing bytecode.
* * *
_Liran Haimovitch will present "[Understanding Pythons Debugging Internals][12]" at [PyBay][3], which will be held August 17-18 in San Francisco. Use code [OpenSource35][13] for a discount when you purchase your ticket to let them know you found out about the event from our community._
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/8/debug-python
作者:[Liran Haimovitch][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/liranhaimovitch
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/python_jungle_lead.jpeg?itok=pFKKEvT- (Real python in the graphic jungle)
[2]: https://rookout.com/
[3]: https://pybay.com/
[4]: https://docs.python.org/3/library/sys.html#sys.settrace
[5]: https://opensource.com/sites/default/files/uploads/python2docs.png (set_trace Python 2 docs page)
[6]: https://opensource.com/sites/default/files/uploads/debuggerresults1.png (First Bdb debugger results)
[7]: https://cython.org/
[8]: https://opensource.com/sites/default/files/uploads/debuggerresults2.png (Second Bdb debugger results)
[9]: https://docs.python.org/2/library/inspect.html
[10]: https://docs.python.org/2/library/dis.html
[11]: https://github.com/GoogleCloudPlatform/cloud-debug-python
[12]: https://pybay.com/speaker/liran-haimovitch/
[13]: https://ti.to/sf-python/pybay2019/discount/OpenSource35

View File

@ -0,0 +1,237 @@
[#]: collector: (lujun9972)
[#]: translator: (caiichenr)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Building a non-breaking breakpoint for Python debugging)
[#]: via: (https://opensource.com/article/19/8/debug-python)
[#]: author: (Liran Haimovitch https://opensource.com/users/liranhaimovitch)
在 Python 调试过程中设置不中断的断点
======
你对如何让调试器变得更快产生过兴趣吗?本文将分享我们在为 Python 构建调试器时得到的一些经验。
![Real python in the graphic jungle][1]
整段故事讲的是我们在 [Rookout][2] 公司的团队为 Python 调试器开发不中断断点的经历,以及开发过程中得到的经验。我将在本月于旧金山举办的 [PyBay 2019][3] 上介绍有关 Python 调试过程的更多细节,但现在就让我们立刻开始这段故事。
### Python 调试器的心脏sys.set_trace
在诸多可选的 Python 调试器中,使用最广泛的三个是:
* **pdb**,它是 Python 标准库的一部分
* **PyDev**,它是内嵌在 Eclipse 和 Pycharm 等 IDE 中的调试器
* **ipdb**它是IPython的调试器
Python 调试器的选择虽多,但它们几乎都基于同一个函数:**sys.set_trace**。 值得一提的是, **[sys.settrace][4]** 可能也是 Python 标准库中最复杂的函数。
![set_trace Python 2 docs page][5]
简单来讲,**settrace** 的作用是为解释器注册一个跟踪函数,它在下列四种情形发生时被调用:
* 函数调用 (Function call)
* 语句执行 (Line execution)
* 函数返回 (Function return)
* 异常抛出 (Exception raised)
一个简单的跟踪函数看上去大概是这样:
```
def simple_tracer(frame, event, arg):
  co = frame.f_code
  func_name = co.co_name
  line_no = frame.f_lineno
  print("{e} {f} {l}".format(
e=event, f=func_name, l=line_no))
  return simple_tracer
```
在分析函数时我们首先关注的是参数和返回值,该跟踪函数的参数分别是:
* **frame**,当前堆栈帧,它是包含当前函数执行时解释器里完整状态的对象
* **event**,它是一个值可能为 **"call"**, **"line"**, **"return"**, 或 **"exception"** 的字符串
* **arg**,它的取值基于 event 的类型,是一个可选项
该跟踪函数的返回值是它自身,这是由于解释器需要持续跟踪两类跟踪函数:
* **全局跟踪函数(每线程):** 该跟踪函数由当前线程调用 **sys.settrace** 来设置,并在解释器创建一个新 **frame** 时被调用(即代码中发生函数调用时)。虽然没有现成的方式来为不同的线程设置跟踪函数,但你可以调用 **threading.settrace** 来为所有新创建的 **threading** 模块线程设置跟踪函数。
* **局部跟踪函数(每一帧):** 解释器将该跟踪函数的值设置为全局跟踪函数创建帧时的返回值。同样也没有现成的方法能够在帧被创建时自动设置局部跟踪函数。
该机制的目的是让调试器对被跟踪的帧有更精确的把握,以减少对性能的影响。
### 简单三步构建调试器 (我们最初的设想)
仅仅依靠上文提到的内容用自制的跟踪函数来构建一个真正的调试器似乎有些不切实际。幸运的是Python 的标准调试器 **pdb** 是基于 **Bdb** 构建的,后者是 Python 标准库中专门用于构建调试器的基类。
基于 **Bdb** 的简易断点调试器看上去是这样的:
```
import bdb
import inspect
class Debugger(bdb.Bdb):
  def __init__(self):
      Bdb.__init__(self)
      self.breakpoints = dict()
      self.set_trace()
def set_breakpoint(self, filename, lineno, method):
  self.set_break(filename, lineno)
  try :
      self.breakpoints[(filename, lineno)].add(method)
  except KeyError:
      self.breakpoints[(filename, lineno)] = [method]
def user_line(self, frame):
  if not self.break_here(frame):
      return
  # Get filename and lineno from frame
  (filename, lineno, _, _, _) = inspect.getframeinfo(frame)
  methods = self.breakpoints[(filename, lineno)]
  for method in methods:
      method(frame)
```
这个调试器类的全部构成是:
1. 继承 **Bdb**,定义一个简单的构造函数来初始化基类,并开始跟踪。
2. 添加 **set_breakpoint** 方法,它使用 **Bdb** 来设置断点,并跟踪这些断点。
3. 重载 **Bdb** 在当前用户行调用的 **user_line** 方法,该方法一定被一个断点调用,之后获取该断点的源位置,并调用已注册的断点。
### 这个简易的 Bdb 调试器效率如何呢?
Rookout 的目标是在生产级性能的使用场景下提供接近普通调试器的使用体验。那么,让我们来看看先前构建出来的简易调试器表现的如何。
为了衡量调试器的整体性能开销,我们使用如下两个简单的函数来进行测试,它们分别在不同的情景下执行了 1600 万次。请注意,在所有情景下断点都不会被执行。
```
def empty_method():
   pass
def simple_method():
   a = 1
   b = 2
   c = 3
   d = 4
   e = 5
   f = 6
   g = 7
   h = 8
   i = 9
   j = 10
```
在使用调试器的情况下需要大量的时间才能完成测试。糟糕的结果指明了,这个简陋 **Bdb** 调试器的性能还远不足以在生产环境中使用。
![First Bdb debugger results][6]
### 对调试器进行优化
降低调试器的额外开销主要有三种方法:
1. **尽可能的限制局部跟踪:** 由于每一行代码都可能包含大量事务,局部跟踪比全局跟踪的开销要大得多。
2. **优化 "call" 事务并尽快将控制权还给解释器:****call** 事务发生时调试器的主要工作是判断是否需要对该事务进行跟踪。
3. **优化 "line" 事务并尽快将控制权还给解释器:****line** 事务发生时调试器的主要工作是判断我们在此处是否需要设置一个断点。
于是我们克隆了 **Bdb** 项目,精简特征,简化代码,针对使用场景进行优化。这些工作虽然得到了一些效果,但仍无法满足我们的需求。因此我们又继续进行了其它的尝试,将代码优化并迁移至 **.pyx** 使用 [Cython][7] 进行编译,可惜结果(如下图所示)依旧不够理想。最终,我们在深入了解 CPython 源码之后意识到,让跟踪过程快到满足生产需求是不可能的。
![Second Bdb debugger results][8]
### 放弃 Bdb 转而尝试字节码操作
熬过先前对标准调试方法进行的试验-失败-再试验循环所带来的失望,我们将目光转向另一种选择:字节码操作。
Python 解释器的工作主要分为两个阶段:
1. **将Python源码编译成Python字节码** 这种不可读(对人类而言)的格式专为执行的效率而优化,它们通常缓存在我们熟知的 **.pyc** 文件当中。
2. **遍历 _interpreter loop_ 中的字节码:** 在这一步中解释器会逐条的执行指令
我们选择的模式是:使用 **字节码操作** 来设置没有全局额外开销的 **不中断断点**。 这种方式的实现首先需要在内存中的字节码里找到我们感兴趣的部分,然后在该部分的相关机器指令前插入一个函数调用。如此一来,解释器无需任何额外的工作即可实现我们的不中断断点。
这种方法并不依靠魔法来实现,让我们简要地举个例子。
首先定义一个简单的函数:
```
def multiply(a, b):
   result = a * b
   return result
```
**[inspect][9]** 模块(其包含了许多实用的单元)的文档里,我们得知可以通过访问 **multiply.func_code.co_code** 来获取函数的字节码:
```
`'|\x00\x00|\x01\x00\x14}\x02\x00|\x02\x00S'`
```
使用 Python 标准库中的 **[dis][10]** 模块可以翻译这些不可读的字符串。调用 **dis.dis(multiply.func_code.co_code)** 之后,我们就可以得到:
```
  4          0 LOAD_FAST               0 (a)
             3 LOAD_FAST               1 (b)
             6 BINARY_MULTIPLY    
             7 STORE_FAST              2 (result)
  5         10 LOAD_FAST               2 (result)
            13 RETURN_VALUE      
```
与直截了当的解决方案相比,这种方法让我们更靠近发生在调试器背后的事情。可惜 Python 并没有提供在解释器中修改函数字节码的方法。我们可以对函数对象进行重写,不过那样做的效率满足不了大多数实际的调试场景。最后我们不得不采用一种迂回的方式来使用原生拓展才能完成这一任务。
### 总结
在构建一个新工具时,总会学到许多事情的工作原理。这种刨根问底的过程能够使你的思路跳出桎梏,从而得到意料之外的解决方案。
在 Rookout 团队中构建不中断断点的这段时间里,我学到了许多有关编译器、调试器、服务器框架、并发模型等等领域的知识。如果你希望更深入的了解字节码操作,谷歌的开源项目 **[cloud-debug-python][11]** 为编辑字节码提供了一些工具。
* * *
_Liran Haimovitch 将于 2019 年八月 17-18 日在旧金山举办的 [PyBay][3] 中发表题为 "[Understanding Pythons Debugging Internals][12]" 的演说,使用 [OpenSource35][13] 可以获得购票优惠并使他们得知您是在我们的社区得知此事。_
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/8/debug-python
作者:[Liran Haimovitch][a]
选题:[lujun9972][b]
译者:[caiichenr](https://github.com/caiichenr)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/liranhaimovitch
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/python_jungle_lead.jpeg?itok=pFKKEvT- (Real python in the graphic jungle)
[2]: https://rookout.com/
[3]: https://pybay.com/
[4]: https://docs.python.org/3/library/sys.html#sys.settrace
[5]: https://opensource.com/sites/default/files/uploads/python2docs.png (set_trace Python 2 docs page)
[6]: https://opensource.com/sites/default/files/uploads/debuggerresults1.png (First Bdb debugger results)
[7]: https://cython.org/
[8]: https://opensource.com/sites/default/files/uploads/debuggerresults2.png (Second Bdb debugger results)
[9]: https://docs.python.org/2/library/inspect.html
[10]: https://docs.python.org/2/library/dis.html
[11]: https://github.com/GoogleCloudPlatform/cloud-debug-python
[12]: https://pybay.com/speaker/liran-haimovitch/
[13]: https://ti.to/sf-python/pybay2019/discount/OpenSource35