Merge pull request #21431 from wxy/20210312-Visualize-multi-threaded-Python-programs-with-an-open-source-tool

TSL&PRF:translated/tech/20210312 Visualize multi-threaded Python programs with an open source tool.md
This commit is contained in:
Xingyu.Wang 2021-03-28 21:21:37 +08:00 committed by GitHub
commit c29bb1ebd2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 256 additions and 288 deletions

View File

@ -1,288 +0,0 @@
[#]: subject: (Visualize multi-threaded Python programs with an open source tool)
[#]: via: (https://opensource.com/article/21/3/python-viztracer)
[#]: author: (Tian Gao https://opensource.com/users/gaogaotiantian)
[#]: collector: (lujun9972)
[#]: translator: (wxy)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
Visualize multi-threaded Python programs with an open source tool
======
VizTracer traces concurrent Python programs to help with logging,
debugging, and profiling.
![Colorful sound wave graph][1]
Concurrency is an essential part of modern programming, as we have multiple cores and many tasks that need to cooperate. However, it's harder to understand concurrent programs when they are not running sequentially. It's not as easy for engineers to identify bugs and performance issues in these programs as it is in a single-thread, single-task program.
With Python, you have multiple options for concurrency. The most common ones are probably multi-threaded with the threading module, multiprocess with the subprocess and multiprocessing modules, and the more recent async syntax with the asyncio module. Before [VizTracer][2], there was a lack of tools to analyze programs using these techniques.
VizTracer is a tool for tracing and visualizing Python programs, which is helpful for logging, debugging, and profiling. Even though it works well for single-thread, single-task programs, its utility in concurrent programs is what makes it unique.
### Try a simple task
Start with a simple practice task: Figure out whether the integers in an array are prime numbers and return a Boolean array. Here is a simple solution:
```
def is_prime(n):
    for i in range(2, n):
        if n % i == 0:
            return False
    return True
def get_prime_arr(arr):
    return [is_prime(elem) for elem in arr]
```
Try to run it normally, in a single thread, with VizTracer:
```
if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for _ in range(6000)]
    get_prime_arr(num_arr)
[/code] [code]`viztracer my_program.py`
```
![Running code in a single thread][3]
(Tian Gao, [CC BY-SA 4.0][4])
The call-stack report indicates it took about 140ms, with most of the time spent in `get_prime_arr`.
![call-stack report][5]
(Tian Gao, [CC BY-SA 4.0][4])
It's just doing the `is_prime` function over and over again on the elements in the array.
This is what you would expect, and it's not that interesting (if you know VizTracer).
### Try a multi-thread program
Try doing it with a multi-thread program:
```
if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for i in range(2000)]
    thread1 = Thread(target=get_prime_arr, args=(num_arr,))
    thread2 = Thread(target=get_prime_arr, args=(num_arr,))
    thread3 = Thread(target=get_prime_arr, args=(num_arr,))
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
```
To match the single-thread program's workload, this uses a 2,000-element array for three threads, simulating a situation where three threads are sharing the task.
![Multi-thread program][6]
(Tian Gao, [CC BY-SA 4.0][4])
As you would expect if you are familiar with Python's Global Interpreter Lock (GIL), it won't get any faster. It took a little bit more than 140ms due to the overhead. However, you can observe the concurrency of multiple threads:
![Concurrency of multiple threads][7]
(Tian Gao, [CC BY-SA 4.0][4])
When one thread was working (executing multiple `is_prime` functions), the other one was frozen (one `is_prime` function); later, they switched. This is due to GIL, and it is the reason Python does not have true multi-threading. It can achieve concurrency but not parallelism.
### Try it with multiprocessing
To achieve parallelism, the way to go is the multiprocessing library. Here is another version with multiprocessing:
```
if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for _ in range(2000)]
   
    p1 = Process(target=get_prime_arr, args=(num_arr,))
    p2 = Process(target=get_prime_arr, args=(num_arr,))
    p3 = Process(target=get_prime_arr, args=(num_arr,))
    p1.start()
    p2.start()
    p3.start()
    p1.join()
    p2.join()
    p3.join()
```
To run it with VizTracer, you need an extra argument:
```
`viztracer --log_multiprocess my_program.py`
```
![Running with extra argument][8]
(Tian Gao, [CC BY-SA 4.0][4])
The whole program finished in a little more than 50ms, with the actual task finishing before the 50ms mark. The program's speed roughly tripled.
To compare it with the multi-thread version, here is the multiprocess version:
![Multi-process version][9]
(Tian Gao, [CC BY-SA 4.0][4])
Without GIL, multiple processes can achieve parallelism, which means multiple `is_prime` functions can execute in parallel.
However, Python's multi-thread is not useless. For example, for computation-intensive and I/O-intensive programs, you can fake an I/O-bound task with sleep:
```
def io_task():
    time.sleep(0.01)
```
Try it in a single-thread, single-task program:
```
if __name__ == "__main__":
    for _ in range(3):
        io_task()
```
![I/O-bound single-thread, single-task program][10]
(Tian Gao, [CC BY-SA 4.0][4])
The full program took about 30ms; nothing special.
Now use multi-thread:
```
if __name__ == "__main__":
    thread1 = Thread(target=io_task)
    thread2 = Thread(target=io_task)
    thread3 = Thread(target=io_task)
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
```
![I/O-bound multi-thread program][11]
(Tian Gao, [CC BY-SA 4.0][4])
The program took 10ms, and it's clear how the three threads worked concurrently and improved the overall performance.
### Try it with asyncio
Python is trying to introduce another interesting feature called async programming. You can make an async version of this task:
```
import asyncio
async def io_task():
    await asyncio.sleep(0.01)
async def main():
    t1 = asyncio.create_task(io_task())
    t2 = asyncio.create_task(io_task())
    t3 = asyncio.create_task(io_task())
    await t1
    await t2
    await t3
if __name__ == "__main__":
    asyncio.run(main())
```
As asyncio is literally a single-thread scheduler with tasks, you can use VizTracer directly on it:
![VizTracer with asyncio][12]
(Tian Gao, [CC BY-SA 4.0][4])
It still took 10ms, but most of the functions displayed are the underlying structure, which is probably not what users are interested in. To solve this, you can use `--log_async` to separate the real task:
```
`viztracer --log_async my_program.py`
```
![Using --log_async to separate tasks][13]
(Tian Gao, [CC BY-SA 4.0][4])
Now the user tasks are much clearer. For most of the time, no tasks are running (because the only thing it does is sleep). Here's the interesting part:
![Graph of task creation and execution][14]
(Tian Gao, [CC BY-SA 4.0][4])
This shows when the tasks were created and executed. Task-1 was the `main()` co-routine and created other tasks. Tasks 2, 3, and 4 executed `io_task` and `sleep` then waited for the wake-up. As the graph shows, there is no overlap between tasks because it's a single-thread program, and VizTracer visualized it this way to make it more understandable.
To make it more interesting, add a `time.sleep` call in the task to block the async loop:
```
async def io_task():
    time.sleep(0.01)
    await asyncio.sleep(0.01)
```
![time.sleep call][15]
(Tian Gao, [CC BY-SA 4.0][4])
The program took much longer (40ms), and the tasks filled the blanks in the async scheduler.
This feature is very helpful for diagnosing behavior and performance issues in async programs.
### See what's happening with VizTracer
With VizTracer, you can see what's going on with your program on a timeline, rather than imaging it from complicated logs. This helps you understand your concurrent programs better.
VizTracer is open source, released under the Apache 2.0 license, and supports all common operating systems (Linux, macOS, and Windows). You can learn more about its features and access its source code in [VizTracer's GitHub repository][16].
--------------------------------------------------------------------------------
via: https://opensource.com/article/21/3/python-viztracer
作者:[Tian Gao][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/gaogaotiantian
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/colorful_sound_wave.png?itok=jlUJG0bM (Colorful sound wave graph)
[2]: https://readthedocs.org/projects/viztracer/
[3]: https://opensource.com/sites/default/files/uploads/viztracer_singlethreadtask.png (Running code in a single thread)
[4]: https://creativecommons.org/licenses/by-sa/4.0/
[5]: https://opensource.com/sites/default/files/uploads/viztracer_callstackreport.png (call-stack report)
[6]: https://opensource.com/sites/default/files/uploads/viztracer_multithread.png (Multi-thread program)
[7]: https://opensource.com/sites/default/files/uploads/viztracer_concurrency.png (Concurrency of multiple threads)
[8]: https://opensource.com/sites/default/files/uploads/viztracer_multithreadrun.png (Running with extra argument)
[9]: https://opensource.com/sites/default/files/uploads/viztracer_comparewithmultiprocess.png (Multi-process version)
[10]: https://opensource.com/sites/default/files/uploads/io-bound_singlethread.png (I/O-bound single-thread, single-task program)
[11]: https://opensource.com/sites/default/files/uploads/io-bound_multithread.png (I/O-bound multi-thread program)
[12]: https://opensource.com/sites/default/files/uploads/viztracer_asyncio.png (VizTracer with asyncio)
[13]: https://opensource.com/sites/default/files/uploads/log_async.png (Using --log_async to separate tasks)
[14]: https://opensource.com/sites/default/files/uploads/taskcreation.png (Graph of task creation and execution)
[15]: https://opensource.com/sites/default/files/uploads/time.sleep_call.png (time.sleep call)
[16]: https://github.com/gaogaotiantian/viztracer

View File

@ -0,0 +1,256 @@
[#]: subject: (Visualize multi-threaded Python programs with an open source tool)
[#]: via: (https://opensource.com/article/21/3/python-viztracer)
[#]: author: (Tian Gao https://opensource.com/users/gaogaotiantian)
[#]: collector: (lujun9972)
[#]: translator: (wxy)
[#]: reviewer: (wxy)
[#]: publisher: ( )
[#]: url: ( )
用一个开源工具实现多线程 Python 程序的可视化
======
> VizTracer 可以跟踪并发的 Python 程序,以帮助记录、调试和剖析。
![丰富多彩的声波图][1]
并发是现代编程中必不可少的一部分,因为我们有多个核心,有许多需要协作的任务。然而,当并发程序不按顺序运行时,就很难理解它们。对于工程师来说,在这些程序中发现 bug 和性能问题不像在单线程、单任务程序中那么容易。
在 Python 中,你有多种并发的选择。最常见的可能是用 `threading` 模块的多线程,用`subprocess` 和 `multiprocessing` 模块的多进程,以及最近用 `asyncio` 模块提供的 `async` 语法。在 [VizTracer][2] 之前,缺乏分析使用了这些技术程序的工具。
VizTracer 是一个追踪和可视化 Python 程序的工具,对日志、调试和剖析很有帮助。尽管它对单线程、单任务程序很好用,但它在并发程序中的实用性是它的独特之处。
## 尝试一个简单的任务
从一个简单的练习任务开始:计算出一个数组中的整数是否是质数并返回一个布尔数组。下面是一个简单的解决方案:
```
def is_prime(n):
for i in range(2, n):
if n % i == 0:
return False
return True
def get_prime_arr(arr):
return [is_prime(elem) for elem in arr]
```
试着用 VizTracer 以单线程方式正常运行它:
```
if __name__ == "__main__":
num_arr = [random.randint(100, 10000) for _ in range(6000)]
get_prime_arr(num_arr)
```
```
viztracer my_program.py
```
![Running code in a single thread][3]
调用堆栈报告显示,耗时约 140ms大部分时间花在 `get_prime_arr` 上。
![call-stack report][5]
这只是在数组中的元素上一遍又一遍地执行 `is_prime` 函数。
这是你所期望的,而且它并不有趣(如果你了解 VizTracer 的话)。
### 试试多线程程序
试着用多线程程序来做:
```
if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for i in range(2000)]
    thread1 = Thread(target=get_prime_arr, args=(num_arr,))
    thread2 = Thread(target=get_prime_arr, args=(num_arr,))
    thread3 = Thread(target=get_prime_arr, args=(num_arr,))
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
```
为了配合单线程程序的工作负载,这就为三个线程使用了一个 2000 元素的数组,模拟了三个线程共享任务的情况。
![Multi-thread program][6]
如果你熟悉 Python 的全局解释器锁GIL就会想到它不会再快了。由于开销太大花了 140ms 多一点的时间。不过,你可以观察到多线程的并发性:
![Concurrency of multiple threads][7]
当一个线程在工作(执行多个 `is_prime` 函数)时,另一个线程被冻结了(一个 `is_prime` 函数);后来,它们进行了切换。这是由于 GIL 的原因,这也是 Python 没有真正的多线程的原因。它可以实现并发,但不能实现并行。
### 用多进程试试
要想实现并行,办法就是 `multiprocessing` 库。下面是另一个使用 `multiprocessing` 的版本:
```
if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for _ in range(2000)]
   
    p1 = Process(target=get_prime_arr, args=(num_arr,))
    p2 = Process(target=get_prime_arr, args=(num_arr,))
    p3 = Process(target=get_prime_arr, args=(num_arr,))
    p1.start()
    p2.start()
    p3.start()
    p1.join()
    p2.join()
    p3.join()
```
要使用 VizTracer 运行它,你需要一个额外的参数:
```
viztracer --log_multiprocess my_program.py
```
![Running with extra argument][8]
整个程序在 50ms 多一点的时间内完成,实际任务在 50ms 之前完成。程序的速度大概提高了三倍。
为了和多线程版本进行比较,这里是多进程版本:
![Multi-process version][9]
在没有 GIL 的情况下,多个进程可以实现并行,也就是多个 `is_prime` 函数可以并行执行。
不过Python 的多线程也不是一无是处。例如,对于计算密集型和 I/O 密集型程序,你可以用睡眠来伪造一个 I/O 绑定的任务:
```
def io_task():
    time.sleep(0.01)
```
在单线程、单任务程序中试试:
```
if __name__ == "__main__":
    for _ in range(3):
        io_task()
```
![I/O-bound single-thread, single-task program][10]
整个程序用了 30ms 左右,没什么特别的。
现在使用多线程:
```
if __name__ == "__main__":
    thread1 = Thread(target=io_task)
    thread2 = Thread(target=io_task)
    thread3 = Thread(target=io_task)
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
```
![I/O-bound multi-thread program][11]
程序耗时 10ms很明显三个线程是并发工作的这提高了整体性能。
### 用 asyncio 试试
Python 正在尝试引入另一个有趣的功能,叫做异步编程。你可以制作一个异步版的任务:
```
import asyncio
async def io_task():
    await asyncio.sleep(0.01)
async def main():
    t1 = asyncio.create_task(io_task())
    t2 = asyncio.create_task(io_task())
    t3 = asyncio.create_task(io_task())
    await t1
    await t2
    await t3
if __name__ == "__main__":
    asyncio.run(main())
```
由于 `asyncio` 从字面上看是一个带有任务的单线程调度器,你可以直接在它上使用 VizTracer
![VizTracer with asyncio][12]
依然花了 10ms但显示的大部分函数都是底层结构这可能不是用户感兴趣的。为了解决这个问题可以使用 `--log_async` 来分离真正的任务:
```
viztracer --log_async my_program.py
```
![Using --log_async to separate tasks][13]
现在,用户任务更加清晰了。在大部分时间里,没有任务在运行(因为它唯一做的事情就是睡觉)。有趣的部分是这里:
![Graph of task creation and execution][14]
这显示了任务的创建和执行时间。Task-1 是 `main()` 协程创建了其他任务。Task-2、Task-3、Task-4 执行 `io_task``sleep` 然后等待唤醒。如图所示因为是单线程程序所以任务之间没有重叠VizTracer 这样可视化是为了让它更容易理解。
为了让它更有趣,可以在任务中添加一个 `time.sleep` 的调用来阻止异步循环:
```
async def io_task():
    time.sleep(0.01)
    await asyncio.sleep(0.01)
```
![time.sleep call][15]
程序耗时更长40ms任务填补了异步调度器中的空白。
这个功能对于诊断异步程序的行为和性能问题非常有帮助。
### 看看 VizTracer 发生了什么?
通过 VizTracer你可以在时间轴上查看程序的进展情况而不是从复杂的日志中想象。这有助于你更好地理解你的并发程序。
VizTracer 是开源的,在 Apache 2.0 许可证下发布支持所有常见的操作系统Linux、macOS 和 Windows。你可以在 [VizTracer 的 GitHub 仓库][16]中了解更多关于它的功能和访问它的源代码。
--------------------------------------------------------------------------------
via: https://opensource.com/article/21/3/python-viztracer
作者:[Tian Gao][a]
选题:[lujun9972][b]
译者:[wxy](https://github.com/wxy)
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/gaogaotiantian
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/colorful_sound_wave.png?itok=jlUJG0bM (Colorful sound wave graph)
[2]: https://readthedocs.org/projects/viztracer/
[3]: https://opensource.com/sites/default/files/uploads/viztracer_singlethreadtask.png (Running code in a single thread)
[4]: https://creativecommons.org/licenses/by-sa/4.0/
[5]: https://opensource.com/sites/default/files/uploads/viztracer_callstackreport.png (call-stack report)
[6]: https://opensource.com/sites/default/files/uploads/viztracer_multithread.png (Multi-thread program)
[7]: https://opensource.com/sites/default/files/uploads/viztracer_concurrency.png (Concurrency of multiple threads)
[8]: https://opensource.com/sites/default/files/uploads/viztracer_multithreadrun.png (Running with extra argument)
[9]: https://opensource.com/sites/default/files/uploads/viztracer_comparewithmultiprocess.png (Multi-process version)
[10]: https://opensource.com/sites/default/files/uploads/io-bound_singlethread.png (I/O-bound single-thread, single-task program)
[11]: https://opensource.com/sites/default/files/uploads/io-bound_multithread.png (I/O-bound multi-thread program)
[12]: https://opensource.com/sites/default/files/uploads/viztracer_asyncio.png (VizTracer with asyncio)
[13]: https://opensource.com/sites/default/files/uploads/log_async.png (Using --log_async to separate tasks)
[14]: https://opensource.com/sites/default/files/uploads/taskcreation.png (Graph of task creation and execution)
[15]: https://opensource.com/sites/default/files/uploads/time.sleep_call.png (time.sleep call)
[16]: https://github.com/gaogaotiantian/viztracer