sources/tech/20210312 Visualize multi-threaded Python programs with an open source tool.md
10 KiB
Visualize multi-threaded Python programs with an open source tool
VizTracer traces concurrent Python programs to help with logging, debugging, and profiling.
Concurrency is an essential part of modern programming, as we have multiple cores and many tasks that need to cooperate. However, it's harder to understand concurrent programs when they are not running sequentially. It's not as easy for engineers to identify bugs and performance issues in these programs as it is in a single-thread, single-task program.
With Python, you have multiple options for concurrency. The most common ones are probably multi-threaded with the threading module, multiprocess with the subprocess and multiprocessing modules, and the more recent async syntax with the asyncio module. Before VizTracer, there was a lack of tools to analyze programs using these techniques.
VizTracer is a tool for tracing and visualizing Python programs, which is helpful for logging, debugging, and profiling. Even though it works well for single-thread, single-task programs, its utility in concurrent programs is what makes it unique.
Try a simple task
Start with a simple practice task: Figure out whether the integers in an array are prime numbers and return a Boolean array. Here is a simple solution:
def is_prime(n):
for i in range(2, n):
if n % i == 0:
return False
return True
def get_prime_arr(arr):
return [is_prime(elem) for elem in arr]
Try to run it normally, in a single thread, with VizTracer:
if __name__ == "__main__":
num_arr = [random.randint(100, 10000) for _ in range(6000)]
get_prime_arr(num_arr)
[/code] [code]`viztracer my_program.py`
(Tian Gao, CC BY-SA 4.0)
The call-stack report indicates it took about 140ms, with most of the time spent in get_prime_arr
.
(Tian Gao, CC BY-SA 4.0)
It's just doing the is_prime
function over and over again on the elements in the array.
This is what you would expect, and it's not that interesting (if you know VizTracer).
Try a multi-thread program
Try doing it with a multi-thread program:
if __name__ == "__main__":
num_arr = [random.randint(100, 10000) for i in range(2000)]
thread1 = Thread(target=get_prime_arr, args=(num_arr,))
thread2 = Thread(target=get_prime_arr, args=(num_arr,))
thread3 = Thread(target=get_prime_arr, args=(num_arr,))
thread1.start()
thread2.start()
thread3.start()
thread1.join()
thread2.join()
thread3.join()
To match the single-thread program's workload, this uses a 2,000-element array for three threads, simulating a situation where three threads are sharing the task.
(Tian Gao, CC BY-SA 4.0)
As you would expect if you are familiar with Python's Global Interpreter Lock (GIL), it won't get any faster. It took a little bit more than 140ms due to the overhead. However, you can observe the concurrency of multiple threads:
(Tian Gao, CC BY-SA 4.0)
When one thread was working (executing multiple is_prime
functions), the other one was frozen (one is_prime
function); later, they switched. This is due to GIL, and it is the reason Python does not have true multi-threading. It can achieve concurrency but not parallelism.
Try it with multiprocessing
To achieve parallelism, the way to go is the multiprocessing library. Here is another version with multiprocessing:
if __name__ == "__main__":
num_arr = [random.randint(100, 10000) for _ in range(2000)]
p1 = Process(target=get_prime_arr, args=(num_arr,))
p2 = Process(target=get_prime_arr, args=(num_arr,))
p3 = Process(target=get_prime_arr, args=(num_arr,))
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()
To run it with VizTracer, you need an extra argument:
`viztracer --log_multiprocess my_program.py`
(Tian Gao, CC BY-SA 4.0)
The whole program finished in a little more than 50ms, with the actual task finishing before the 50ms mark. The program's speed roughly tripled.
To compare it with the multi-thread version, here is the multiprocess version:
(Tian Gao, CC BY-SA 4.0)
Without GIL, multiple processes can achieve parallelism, which means multiple is_prime
functions can execute in parallel.
However, Python's multi-thread is not useless. For example, for computation-intensive and I/O-intensive programs, you can fake an I/O-bound task with sleep:
def io_task():
time.sleep(0.01)
Try it in a single-thread, single-task program:
if __name__ == "__main__":
for _ in range(3):
io_task()
(Tian Gao, CC BY-SA 4.0)
The full program took about 30ms; nothing special.
Now use multi-thread:
if __name__ == "__main__":
thread1 = Thread(target=io_task)
thread2 = Thread(target=io_task)
thread3 = Thread(target=io_task)
thread1.start()
thread2.start()
thread3.start()
thread1.join()
thread2.join()
thread3.join()
(Tian Gao, CC BY-SA 4.0)
The program took 10ms, and it's clear how the three threads worked concurrently and improved the overall performance.
Try it with asyncio
Python is trying to introduce another interesting feature called async programming. You can make an async version of this task:
import asyncio
async def io_task():
await asyncio.sleep(0.01)
async def main():
t1 = asyncio.create_task(io_task())
t2 = asyncio.create_task(io_task())
t3 = asyncio.create_task(io_task())
await t1
await t2
await t3
if __name__ == "__main__":
asyncio.run(main())
As asyncio is literally a single-thread scheduler with tasks, you can use VizTracer directly on it:
(Tian Gao, CC BY-SA 4.0)
It still took 10ms, but most of the functions displayed are the underlying structure, which is probably not what users are interested in. To solve this, you can use --log_async
to separate the real task:
`viztracer --log_async my_program.py`
(Tian Gao, CC BY-SA 4.0)
Now the user tasks are much clearer. For most of the time, no tasks are running (because the only thing it does is sleep). Here's the interesting part:
(Tian Gao, CC BY-SA 4.0)
This shows when the tasks were created and executed. Task-1 was the main()
co-routine and created other tasks. Tasks 2, 3, and 4 executed io_task
and sleep
then waited for the wake-up. As the graph shows, there is no overlap between tasks because it's a single-thread program, and VizTracer visualized it this way to make it more understandable.
To make it more interesting, add a time.sleep
call in the task to block the async loop:
async def io_task():
time.sleep(0.01)
await asyncio.sleep(0.01)
(Tian Gao, CC BY-SA 4.0)
The program took much longer (40ms), and the tasks filled the blanks in the async scheduler.
This feature is very helpful for diagnosing behavior and performance issues in async programs.
See what's happening with VizTracer
With VizTracer, you can see what's going on with your program on a timeline, rather than imaging it from complicated logs. This helps you understand your concurrent programs better.
VizTracer is open source, released under the Apache 2.0 license, and supports all common operating systems (Linux, macOS, and Windows). You can learn more about its features and access its source code in VizTracer's GitHub repository.
via: https://opensource.com/article/21/3/python-viztracer