mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
translated
This commit is contained in:
parent
75edd88d7e
commit
6555f61e8a
@ -1,351 +0,0 @@
|
||||
Cgo and Python
|
||||
============================================================
|
||||
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_hero.png?auto=format&w=1900&dpr=1)
|
||||
|
||||
|
||||
|
||||
If you look at the [new Datadog Agent][8], you might notice most of the codebase is written in Go, although the checks we use to gather metrics are still written in Python. This is possible because the Datadog Agent, a regular Go binary, [embeds][9] a CPython interpreter that can be called whenever it needs to execute Python code. This process can be made transparent using an abstraction layer so that you can still write idiomatic Go code even when there’s Python running under the hood.
|
||||
|
||||
[video](https://youtu.be/yrEi5ezq2-c)
|
||||
|
||||
There are a number of reasons why you might want to embed Python in a Go application:
|
||||
|
||||
* It is useful during a port; gradually moving portions of an existing Python project to the new language without losing any functionality during the process.
|
||||
|
||||
* You can reuse existing Python software or libraries without re-implementing them in the new language.
|
||||
|
||||
* You can dynamically extend your software by loading and executing regular Python scripts, even at runtime.
|
||||
|
||||
The list could go on, but for the Datadog Agent the last point is crucial: we want you to be able to execute custom checks or change existing ones without forcing you to recompile the Agent, or in general, to compile anything.
|
||||
|
||||
Embedding CPython is quite easy and well documented. The interpreter itself is written in C and a C API is provided to programmatically perform operations at a very low level, like creating objects, importing modules, and calling functions.
|
||||
|
||||
In this article we’ll show some code examples, and we’ll focus on keeping the Go code idiomatic while interacting with Python at the same time, but before we proceed we need to address a small gap: the embedding API is C but our main application is Go, how can this possibly work?
|
||||
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_1.png?auto=format&fit=max&w=847)
|
||||
|
||||
### Introducing cgo
|
||||
|
||||
There are [a number of good reasons][10] why you might not want to introduce cgo in your stack, but embedding CPython is one of those cases where you must. [Cgo][11] is not a language nor a compiler. It’s a [Foreign Function Interface][12] (FFI), a mechanism we can use in Go to invoke functions and services written in a different language, specifically C.
|
||||
|
||||
When we say “cgo” we’re actually referring to a set of tools, libraries, functions, and types that are used by the go toolchain under the hood so we can keep doing `go build` to get our Go binaries. An absolutely minimal example of a program using cgo looks like this:
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
// #include <float.h>
|
||||
import "C"
|
||||
import "fmt"
|
||||
|
||||
func main() {
|
||||
fmt.Println("Max float value of float is", C.FLT_MAX)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
The comment block right above the `import "C"` instruction is called a “preamble” and can contain actual C code, in this case an header inclusion. Once imported, the “C” pseudo-package lets us “jump” to the foreign code, accessing the `FLT_MAX` constant. You can build the example by invoking `go build`, the same as if it was plain Go.
|
||||
|
||||
If you want to have a look at all the work cgo does under the hood, run `go build -x`. You’ll see the “cgo” tool will be invoked to generate some C and Go modules, then the C and Go compilers will be invoked to build the object modules and finally the linker will put everything together.
|
||||
|
||||
You can read more about cgo on the [Go blog][13]. The article contains more examples and few useful links to get further into details.
|
||||
|
||||
Now that we have an idea of what cgo can do for us, let’s see how we can run some Python code using this mechanism.
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_2.png?auto=format&fit=max&w=847)
|
||||
|
||||
### Embedding CPython: a primer
|
||||
|
||||
A Go program that, technically speaking, embeds CPython is not as complicated as you might expect. In fact, at the bare minimum, all we have to do is initialize the interpreter before running any Python code and finalize it when we’re done. Please note that we’re going to use Python 2.x throughout all the examples but everything we’ll see can be applied to Python 3.x as well with very little adaptation. Let’s look at an example:
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
// #cgo pkg-config: python-2.7
|
||||
// #include <Python.h>
|
||||
import "C"
|
||||
import "fmt"
|
||||
|
||||
func main() {
|
||||
C.Py_Initialize()
|
||||
fmt.Println(C.GoString(C.Py_GetVersion()))
|
||||
C.Py_Finalize()
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
The example above does exactly what the following Python code would do:
|
||||
|
||||
```
|
||||
import sys
|
||||
print(sys.version)
|
||||
|
||||
```
|
||||
|
||||
You can see we put a `#cgo` directive in the preamble; those directives are passed to the toolchain to let you change the build workflow. In this case, we tell cgo to invoke “pkg-config” to gather the flags needed to build and link against a library called “python-2.7” and pass those flags to the C compiler. If you have the CPython development libraries installed in your system along with pkg-config, this would let you keep using a plain `go build` to compile the example above.
|
||||
|
||||
Back to the code, we use `Py_Initialize()` and `Py_Finalize()` to set up and shut down the interpreter and the `Py_GetVersion` C function to retrieve the string containing the version information for the embedded interpreter.
|
||||
|
||||
If you’re wondering, all the cgo bits we need to put together to invoke the C Python API are boilerplate code. This is why the Datadog Agent relies on [go-python][14] for all the embedding operations; the library provides a Go friendly thin wrapper around the C API and hides the cgo details. This is another basic embedding example, this time using go-python:
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
import (
|
||||
python "github.com/sbinet/go-python"
|
||||
)
|
||||
|
||||
func main() {
|
||||
python.Initialize()
|
||||
python.PyRun_SimpleString("print 'hello, world!'")
|
||||
python.Finalize()
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
This looks closer to regular Go code, no more cgo exposed and we can use Go strings back and forth while accessing the Python API. Embedding looks powerful and developer friendly. Time to put the interpreter to good use: let’s try to load a Python module from disk.
|
||||
|
||||
We don’t need anything complex on the Python side, the ubiquitous “hello world” will serve the purpose:
|
||||
|
||||
```
|
||||
# foo.py
|
||||
def hello():
|
||||
"""
|
||||
Print hello world for fun and profit.
|
||||
"""
|
||||
print "hello, world!"
|
||||
|
||||
```
|
||||
|
||||
The Go code is slightly more complex but still readable:
|
||||
|
||||
```
|
||||
// main.go
|
||||
package main
|
||||
|
||||
import "github.com/sbinet/go-python"
|
||||
|
||||
func main() {
|
||||
python.Initialize()
|
||||
defer python.Finalize()
|
||||
|
||||
fooModule := python.PyImport_ImportModule("foo")
|
||||
if fooModule == nil {
|
||||
panic("Error importing module")
|
||||
}
|
||||
|
||||
helloFunc := fooModule.GetAttrString("hello")
|
||||
if helloFunc == nil {
|
||||
panic("Error importing function")
|
||||
}
|
||||
|
||||
// The Python function takes no params but when using the C api
|
||||
// we're required to send (empty) *args and **kwargs anyways.
|
||||
helloFunc.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
Once built, we need to set the `PYTHONPATH` environment variable to the current working dir so that the import statement will be able to find the `foo.py`module. From a shell, the command would look like this:
|
||||
|
||||
```
|
||||
$ go build main.go && PYTHONPATH=. ./main
|
||||
hello, world!
|
||||
|
||||
```
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_3.png?auto=format&fit=max&w=847)
|
||||
|
||||
### The dreadful Global Interpreter Lock
|
||||
|
||||
Having to bring in cgo in order to embed Python is a tradeoff: builds will be slower, the Garbage Collector won’t help us managing memory used by the foreign system, and cross compilation will be non-trivial. Whether or not these are concerns for a specific project can be debated, but there’s something I deem not negotiable: the Go concurrency model. If we couldn’t run Python from a goroutine, using Go altogether would make very little sense.
|
||||
|
||||
Before playing with concurrency, Python, and cgo, there’s something we need to know: it’s the Global Interpreter Lock, also known as the GIL. The GIL is a mechanism widely adopted in language interpreters (CPython is one of those) preventing more than one thread from running at the same time. This means that no Python program executed by CPython will be ever able to run in parallel within the same process. Concurrency is still possible and in the end, the lock is a good tradeoff between speed, security, and implementation simplicity. So why should this pose a problem when it comes to embedding?
|
||||
|
||||
When a regular, non-embedded Python program starts, there’s no GIL involved to avoid useless overhead in locking operations; the GIL starts the first time some Python code requests to spawn a thread. For each thread, the interpreter creates a data structure to store information about the current state and locks the GIL. When the thread has finished, the state is restored and the GIL unlocked, ready to be used by other threads.
|
||||
|
||||
When we run Python from a Go program, none of the above happens automatically. Without the GIL, multiple Python threads could be created by our Go program. This could cause a race condition leading to fatal runtime errors, and most likely a segmentation fault bringing down the whole Go application.
|
||||
|
||||
The solution to this problem is to explicitly invoke the GIL whenever we run multithreaded code from Go; the code is not complex because the C API provides all the tools we need. To better expose the problem, we need to do something CPU bounded from Python. Let’s add these functions to our foo.py module from the previous example:
|
||||
|
||||
```
|
||||
# foo.py
|
||||
import sys
|
||||
|
||||
def print_odds(limit=10):
|
||||
"""
|
||||
Print odds numbers < limit
|
||||
"""
|
||||
for i in range(limit):
|
||||
if i%2:
|
||||
sys.stderr.write("{}\n".format(i))
|
||||
|
||||
def print_even(limit=10):
|
||||
"""
|
||||
Print even numbers < limit
|
||||
"""
|
||||
for i in range(limit):
|
||||
if i%2 == 0:
|
||||
sys.stderr.write("{}\n".format(i))
|
||||
|
||||
```
|
||||
|
||||
We’ll try to print odd and even numbers concurrently from Go, using two different goroutines (thus involving threads):
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
import (
|
||||
"sync"
|
||||
|
||||
"github.com/sbinet/go-python"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// The following will also create the GIL explicitly
|
||||
// by calling PyEval_InitThreads(), without waiting
|
||||
// for the interpreter to do that
|
||||
python.Initialize()
|
||||
|
||||
var wg sync.WaitGroup
|
||||
wg.Add(2)
|
||||
|
||||
fooModule := python.PyImport_ImportModule("foo")
|
||||
odds := fooModule.GetAttrString("print_odds")
|
||||
even := fooModule.GetAttrString("print_even")
|
||||
|
||||
// Initialize() has locked the the GIL but at this point we don't need it
|
||||
// anymore. We save the current state and release the lock
|
||||
// so that goroutines can acquire it
|
||||
state := python.PyEval_SaveThread()
|
||||
|
||||
go func() {
|
||||
_gstate := python.PyGILState_Ensure()
|
||||
odds.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
python.PyGILState_Release(_gstate)
|
||||
|
||||
wg.Done()
|
||||
}()
|
||||
|
||||
go func() {
|
||||
_gstate := python.PyGILState_Ensure()
|
||||
even.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
python.PyGILState_Release(_gstate)
|
||||
|
||||
wg.Done()
|
||||
}()
|
||||
|
||||
wg.Wait()
|
||||
|
||||
// At this point we know we won't need Python anymore in this
|
||||
// program, we can restore the state and lock the GIL to perform
|
||||
// the final operations before exiting.
|
||||
python.PyEval_RestoreThread(state)
|
||||
python.Finalize()
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
While reading the example you might note a pattern, the pattern that will become our mantra to run embedded Python code:
|
||||
|
||||
1. Save the state and lock the GIL.
|
||||
|
||||
2. Do Python.
|
||||
|
||||
3. Restore the state and unlock the GIL.
|
||||
|
||||
The code should be straightforward but there’s a subtle detail we want to point out: notice that despite seconding the GIL mantra, in one case we operate the GIL by calling `PyEval_SaveThread()` and `PyEval_RestoreThread()`, in another (look inside the goroutines) we do the same with `PyGILState_Ensure()`and `PyGILState_Release()`.
|
||||
|
||||
We said when multithreading is operated from Python, the interpreter takes care of creating the data structure needed to store the current state, but when the same happens from the C API, we’re responsible for that.
|
||||
|
||||
When we initialize the interpreter with go-python, we’re operating in a Python context. So when `PyEval_InitThreads()` is called it initializes the data structure and locks the GIL. We can use `PyEval_SaveThread()` and `PyEval_RestoreThread()` to operate on already existing state.
|
||||
|
||||
Inside the goroutines, we’re operating from a Go context and we need to explicitly create the state and remove it when done, which is what `PyGILState_Ensure()` and `PyGILState_Release()` do for us.
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_4.png?auto=format&fit=max&w=847)
|
||||
|
||||
### Unleash the Gopher
|
||||
|
||||
At this point we know how to deal with multithreading Go code executing Python in an embedded interpreter but after the GIL, another challenge is right around the corner: the Go scheduler.
|
||||
|
||||
When a goroutine starts, it’s scheduled for execution on one of the `GOMAXPROCS`threads available—[see here][15] for more details on the topic. If a goroutine happens to perform a syscall or call C code, the current thread hands over the other goroutines waiting to run in the thread queue to another thread so they can have better chances to run; the current goroutine is paused, waiting for the syscall or the C function to return. When this happens, the thread tries to resume the paused goroutine, but if this is not possible, it asks the Go runtime to find another thread to complete the goroutine and goes to sleep. The goroutine is finally scheduled to another thread and it finishes.
|
||||
|
||||
With this in mind, let’s see what can happen to a goroutine running some Python code when a goroutine is moved to a new thread::
|
||||
|
||||
1. Our goroutine starts, performs a C call, and pauses. The GIL is locked.
|
||||
|
||||
2. When the C call returns, the current thread tries to resume the goroutine, but it fails.
|
||||
|
||||
3. The current thread tells the Go runtime to find another thread to resume our goroutine.
|
||||
|
||||
4. The Go scheduler finds an available thread and the goroutine is resumed.
|
||||
|
||||
5. The goroutine is almost done and tries to unlock the GIL before returning.
|
||||
|
||||
6. The thread ID stored in the current state is from the original thread and is different from the ID of the current thread.
|
||||
|
||||
7. Panic!
|
||||
|
||||
Luckily for us, we can force the Go runtime to always keep our goroutine running on the same thread by calling the LockOSThread function from the runtime package from within a goroutine:
|
||||
|
||||
```
|
||||
go func() {
|
||||
runtime.LockOSThread()
|
||||
|
||||
_gstate := python.PyGILState_Ensure()
|
||||
odds.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
python.PyGILState_Release(_gstate)
|
||||
wg.Done()
|
||||
}()
|
||||
|
||||
```
|
||||
|
||||
This will interfere with the scheduler and might introduce some overhead, but it’s a price that we’re willing to pay to avoid random panics.
|
||||
|
||||
### Conclusions
|
||||
|
||||
In order to embed Python, the Datadog Agent has to accept a few tradeoffs:
|
||||
|
||||
* The overhead introduced by cgo.
|
||||
|
||||
* The task of manually handling the GIL.
|
||||
|
||||
* The limitation of binding goroutines to the same thread during execution.
|
||||
|
||||
We’re happy to accept each of these for the convenience of running Python checks in Go. But by being conscious of the tradeoffs, we’re able to minimize their effect. Regarding other limitations introduced to support Python, we have few countermeasures to contain potential issues:
|
||||
|
||||
* The build is automated and configurable so that devs have still something very similar to `go build`.
|
||||
|
||||
* A lightweight version of the agent can be built stripping out Python support entirely simply using Go build tags.
|
||||
|
||||
* Such a version only relies on core checks hardcoded in the agent itself (system and network checks mostly) but is cgo free and can be cross compiled.
|
||||
|
||||
We’ll re-evaluate our options in the future and decide whether keeping around cgo is still worth it; we could even reconsider whether Python as a whole is still worth it, waiting for the [Go plugin package][16] to be mature enough to support our use case. But for now the embedded Python is working well and transitioning from the old Agent to the new one couldn’t be easier.
|
||||
|
||||
Are you a polyglot who loves mixing different programming languages? Do you love learning about the inner workings of languages to make your code more performant? [Join us at Datadog!][17]
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
|
||||
作者:[ Massimiliano Pippi][a]
|
||||
译者:[Zioyi](https://github.com/Zioyi)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://github.com/masci
|
||||
[1]:http://twitter.com/share?url=https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
[2]:http://www.reddit.com/submit?url=https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
[3]:https://www.linkedin.com/shareArticle?mini=true&url=https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
[4]:https://www.datadoghq.com/blog/category/under-the-hood
|
||||
[5]:https://www.datadoghq.com/blog/tag/agent
|
||||
[6]:https://www.datadoghq.com/blog/tag/golang
|
||||
[7]:https://www.datadoghq.com/blog/tag/python
|
||||
[8]:https://github.com/DataDog/datadog-agent/
|
||||
[9]:https://docs.python.org/2/extending/embedding.html
|
||||
[10]:https://dave.cheney.net/2016/01/18/cgo-is-not-go
|
||||
[11]:https://golang.org/cmd/cgo/
|
||||
[12]:https://en.wikipedia.org/wiki/Foreign_function_interface
|
||||
[13]:https://blog.golang.org/c-go-cgo
|
||||
[14]:https://github.com/sbinet/go-python
|
||||
[15]:https://morsmachine.dk/go-scheduler
|
||||
[16]:https://golang.org/pkg/plugin/
|
||||
[17]:https://www.datadoghq.com/careers/
|
349
translated/tech/20180416 Cgo and Python.md
Normal file
349
translated/tech/20180416 Cgo and Python.md
Normal file
@ -0,0 +1,349 @@
|
||||
Cgo 与 Python
|
||||
============================================================
|
||||
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_hero.png?auto=format&w=1900&dpr=1)
|
||||
|
||||
|
||||
|
||||
如果你查看 [new Datadog Agent][8],你可能会注意到大部分代码库是用 Go 编写的,但我们用来收集指标的检查仍然是用 Python 编写的。这大概是因为 Datadog Agent 是基于一个[嵌入][9] CPython 解释器的 GO,可以在任何时候按需执行 Python 代码。这个过程通过抽象层来透明化,使得你可以编写惯用的 Go 代码来运行后台的 Python。
|
||||
|
||||
[视频](https://youtu.be/yrEi5ezq2-c)
|
||||
|
||||
在 Go 应用程序中嵌入 Python 的原因有很多:
|
||||
|
||||
* 它在过渡期间很有用;逐步将现有 Python 项目的部分迁移到新语言,而不会在此过程中丢失任何功能。
|
||||
|
||||
* 你可以复用现有的 Python 软件或库,而无需用新语言重新实现。
|
||||
|
||||
* 你可以通过加载去执行常规 Python 脚本来动态扩展你软件,甚至在运行时也可以。
|
||||
|
||||
理由还可以列很多,但对于 Datadog Agent 来说,最后一点至关重要:我们希望做到无需重新编译 Agent,或者说编译任何内容就能够执行自定义检查或更改现有检查。
|
||||
|
||||
嵌入 CPython 非常简单,而且文档齐全。解释器本身是用 C 编写的,并且提供了一个 C API 以编程方式来执行底层操作,例如创建对象、导入模块和调用函数。
|
||||
|
||||
在本文中,我们将展示一些代码示例,我们将会在与 Python 交互的同时继续保持 Go 代码的惯用语,但在我们继续之前,我们需要解决一个间隙:嵌入 API 是 C 但我们的主要应用程序是 Go,这怎么可能工作?
|
||||
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_1.png?auto=format&fit=max&w=847)
|
||||
|
||||
### 介绍 cgo
|
||||
|
||||
有[很多好的理由][10] 说服你为什么不要在堆栈中引入 cgo,但嵌入 CPython 是你必须这样做的原因。[Cgo][11] 不是语言,也不是编译器。它是 [Foreign Function Interface][12] (FFI),一种让我们可以在 Go 中使用来调用不同语言(特别是 C)编写的函数和服务的机制。
|
||||
|
||||
当我们提起“cgo”时,我们实际上指的是 Go 工具链在底层使用的一组工具、库、函数和类型,因此我们可以通过执行 `go build` 来获取我们的 Go 二进制文件。下面是使用 cgo 的示例程序:
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
// #include <float.h>
|
||||
import "C"
|
||||
import "fmt"
|
||||
|
||||
func main() {
|
||||
fmt.Println("Max float value of float is", C.FLT_MAX)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
在这种包含标头情况下,`import "C"` 指令上方的注释块称为“前导码”,可以包含实际的 C 代码。导入后,我们可以通过“C”伪包来“跳转”到外部代码,访问常量`FLT_MAX`。你可以通过调用 `go build` 来构建,它就像普通的 Go 一样。
|
||||
|
||||
如果你想查看 cgo 在这背后到底做了什么,可以运行 `go build -x`。你将看到“cgo”工具将被调用以生成一些 C 和 Go 模块,然后将调用 C 和 Go 编译器来构建目标模块,最后链接器将所有内容放在一起。
|
||||
|
||||
你可以在 [Go Blog][13] 上阅读更多有关 cgo 的信息,文章包含更多的例子以及一些有用的链接来做进一步了解细节。
|
||||
|
||||
现在我们已经了解了 cgo 可以为我们做什么,让我们看看如何使用这种机制运行一些 Python 代码。
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_2.png?auto=format&fit=max&w=847)
|
||||
|
||||
### 嵌入 CPython: a primer
|
||||
|
||||
从技术上讲,嵌入 CPython 的 Go 程序并没有你想象的那么复杂。事实上,我们只需在运行 Python 代码之前初始化解释器,并在完成后关闭它。请注意,我们在所有示例中使用 Python 2.x,但我们只需做很少的调整就可以应用于 Python 3.x。让我们看一个例子:
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
// #cgo pkg-config: python-2.7
|
||||
// #include <Python.h>
|
||||
import "C"
|
||||
import "fmt"
|
||||
|
||||
func main() {
|
||||
C.Py_Initialize()
|
||||
fmt.Println(C.GoString(C.Py_GetVersion()))
|
||||
C.Py_Finalize()
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
上面的示例与以下 Python 代码完全相同:
|
||||
|
||||
```
|
||||
import sys
|
||||
print(sys.version)
|
||||
|
||||
```
|
||||
|
||||
你可以看到我们在开头加入了一个 `#cgo` 指令;这些指令被会被传递到工具链,你就改变了构建工作流程。在这种情况下,我们告诉 cgo 调用`pkg-config`来收集构建并链接名为“python-2.7”的库所需的标志,并将这些标志传递给 C 编译器。如果你的系统中安装了 CPython 开发库和 pkg-config,你只需要运行 `go build` 来编译上面的示例。
|
||||
|
||||
回到代码,我们使用`Py_Initialize()` 和`Py_Finalize()` 来初始化和关闭解释器,并使用`Py_GetVersion` C 函数来获取嵌入式解释器版本信息的字符串。
|
||||
|
||||
如果你更近一步,我们可以把所有调用 C Python API 的 cgo 代码一起,这就是 Datadog Agent 进行所有嵌入式操作所有依赖的 [go-python][14] 做的事情;该库为 C API 提供了一个 Go 友好的轻量级包,并隐藏了 cgo 细节。这是另一个基本的嵌入式示例,这次使用 go-python:
|
||||
```
|
||||
package main
|
||||
|
||||
import (
|
||||
python "github.com/sbinet/go-python"
|
||||
)
|
||||
|
||||
func main() {
|
||||
python.Initialize()
|
||||
python.PyRun_SimpleString("print 'hello, world!'")
|
||||
python.Finalize()
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
这看起来更接近普通 Go 代码,不再暴露 cgo,我们可以在访问 Python API 时来回使用 Go 字符串。嵌入式看起来功能强大且对开发人员友好,是时候充分利用解释器了:让我们尝试从磁盘加载 Python 模块。
|
||||
|
||||
在 Python 方面我们不需要任何复杂的东西,无处不在的“hello world”就可以达到目的:
|
||||
|
||||
```
|
||||
# foo.py
|
||||
def hello():
|
||||
"""
|
||||
Print hello world for fun and profit.
|
||||
"""
|
||||
print "hello, world!"
|
||||
|
||||
```
|
||||
|
||||
Go 代码稍微复杂一些,但仍然可读:
|
||||
|
||||
```
|
||||
// main.go
|
||||
package main
|
||||
|
||||
import "github.com/sbinet/go-python"
|
||||
|
||||
func main() {
|
||||
python.Initialize()
|
||||
defer python.Finalize()
|
||||
|
||||
fooModule := python.PyImport_ImportModule("foo")
|
||||
if fooModule == nil {
|
||||
panic("Error importing module")
|
||||
}
|
||||
|
||||
helloFunc := fooModule.GetAttrString("hello")
|
||||
if helloFunc == nil {
|
||||
panic("Error importing function")
|
||||
}
|
||||
|
||||
// The Python function takes no params but when using the C api
|
||||
// we're required to send (empty) *args and **kwargs anyways.
|
||||
helloFunc.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
首次构建,我们需要将 `PYTHONPATH` 环境变量设置为当前工作目录,以便导入语句能够找到 `foo.py` 模块。在 shell 中,该命令如下所示:
|
||||
|
||||
```
|
||||
$ go build main.go && PYTHONPATH=. ./main
|
||||
hello, world!
|
||||
|
||||
```
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_3.png?auto=format&fit=max&w=847)
|
||||
|
||||
### 可怕的全局解释器锁
|
||||
|
||||
为了嵌入 Python 必须引入 cgo 需要权衡:构建速度会变慢,垃圾收集器不会帮助我们管理外部系统使用的内存,交叉编译也很难。这些是否是针对特定项目的问题可以讨论,但我认为有一些不容商量的问题:Go 并发模型。如果我们不能从 goroutine 运行 Python,那么使用 Go 就没有意义了。
|
||||
|
||||
在处理并发、Python 和 cgo 之前,我们还需要知道一些事情:它就是全局解释器锁,也称为 GIL。GIL 是语言解释器(CPython 就是其中之一)中广泛采用的一种机制,可防止多个线程同时运行。这意味着 CPython 执行的任何 Python 程序都无法在同一进程中并行运行。并发仍然是可能的,锁是速度、安全性和实现之间的一个很好的权衡,那么,当涉及到嵌入时,为什么这会造成问题呢?
|
||||
|
||||
当一个常规的、非嵌入式的 Python 程序启动时,不涉及 GIL 以避免锁定操作中的无用开销;在某些 Python 代码请求生成线程时 GIL 首次启动。对于每个线程,解释器创建一个数据结构来存储当前的相关状态信息并锁定 GIL。当线程完成时,状态被恢复,GIL 被解锁,准备被其他线程使用。
|
||||
|
||||
当我们从 Go 程序运行 Python 时,上述情况都不会自动发生。如果没有 GIL,我们的 Go 程序可以创建多个 Python 线程,这可能会导致竞争条件,从而导致致命的运行时错误,并且很可能是分段错误导致整个 Go 应用程序瘫痪。
|
||||
|
||||
解决方案是在我们从 Go 运行多线程代码时显式调用 GIL;代码并不复杂,因为 C API 提供了我们需要的所有工具。为了更好地暴露这个问题,我们需要写一些受 CPU 限制的 Python 代码。让我们将这些函数添加到前面示例中的 foo.py 模块中:
|
||||
|
||||
```
|
||||
# foo.py
|
||||
import sys
|
||||
|
||||
def print_odds(limit=10):
|
||||
"""
|
||||
Print odds numbers < limit
|
||||
"""
|
||||
for i in range(limit):
|
||||
if i%2:
|
||||
sys.stderr.write("{}\n".format(i))
|
||||
|
||||
def print_even(limit=10):
|
||||
"""
|
||||
Print even numbers < limit
|
||||
"""
|
||||
for i in range(limit):
|
||||
if i%2 == 0:
|
||||
sys.stderr.write("{}\n".format(i))
|
||||
|
||||
```
|
||||
|
||||
我们将尝试从 Go 并发打印奇数和偶数,使用两个不同的 goroutines(因此涉及线程):
|
||||
|
||||
```
|
||||
package main
|
||||
|
||||
import (
|
||||
"sync"
|
||||
|
||||
"github.com/sbinet/go-python"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// The following will also create the GIL explicitly
|
||||
// by calling PyEval_InitThreads(), without waiting
|
||||
// for the interpreter to do that
|
||||
python.Initialize()
|
||||
|
||||
var wg sync.WaitGroup
|
||||
wg.Add(2)
|
||||
|
||||
fooModule := python.PyImport_ImportModule("foo")
|
||||
odds := fooModule.GetAttrString("print_odds")
|
||||
even := fooModule.GetAttrString("print_even")
|
||||
|
||||
// Initialize() has locked the the GIL but at this point we don't need it
|
||||
// anymore. We save the current state and release the lock
|
||||
// so that goroutines can acquire it
|
||||
state := python.PyEval_SaveThread()
|
||||
|
||||
go func() {
|
||||
_gstate := python.PyGILState_Ensure()
|
||||
odds.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
python.PyGILState_Release(_gstate)
|
||||
|
||||
wg.Done()
|
||||
}()
|
||||
|
||||
go func() {
|
||||
_gstate := python.PyGILState_Ensure()
|
||||
even.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
python.PyGILState_Release(_gstate)
|
||||
|
||||
wg.Done()
|
||||
}()
|
||||
|
||||
wg.Wait()
|
||||
|
||||
// At this point we know we won't need Python anymore in this
|
||||
// program, we can restore the state and lock the GIL to perform
|
||||
// the final operations before exiting.
|
||||
python.PyEval_RestoreThread(state)
|
||||
python.Finalize()
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
在阅读示例时,您可能会注意到一个模式,该模式将成为我们运行嵌入式 Python 代码的习惯写法:
|
||||
|
||||
1. 保存状态并锁定 GIL。
|
||||
|
||||
2. 执行 Python.
|
||||
|
||||
3. 恢复状态并解锁 GIL。
|
||||
|
||||
代码应该很简单,但我们想指出一个微妙的细节:请注意,尽管借用了 GIL 执行,有时我们通过调用 `PyEval_SaveThread()` 和 `PyEval_RestoreThread()` 来操作 GIL,有时(查看 goroutines)我们对 `PyGILState_Ensure()` 和 `PyGILState_Release()` 来做同样的事情。
|
||||
|
||||
我们说过当从 Python 操作多线程时,解释器负责创建存储当前状态所需的数据结构,但是当同样的事情发生在 C API 时,我们来负责处理。
|
||||
|
||||
当我们用 go-python 初始化解释器时,我们是在 Python 上下文中操作的。因此,当调用 `PyEval_InitThreads()` 时,它会初始化数据结构并锁定 GIL。我们可以使用 `PyEval_SaveThread()` 和 `PyEval_RestoreThread()` 对已经存在的状态进行操作。
|
||||
|
||||
在 goroutines 中,我们从 Go 上下文操作,我们需要显式创建状态并在完成后将其删除,这就是 `PyGILState_Ensure()` 和 `PyGILState_Release()` 为我们所做的。
|
||||
![](https://datadog-prod.imgix.net/img/blog/engineering/cgo-and-python/cgo_python_divider_4.png?auto=format&fit=max&w=847)
|
||||
扩展程序选项
|
||||
### 释放 Gopher
|
||||
|
||||
在这一点上,我们知道如何处理在嵌入式解释器中执行 Python 的多线程 Go 代码,但在 GIL 之后,另一个挑战即将来临:Go 调度程序。
|
||||
|
||||
当一个 goroutine 启动时,它被安排在可用的 `GOMAXPROCS` 线程之一上执行[参见此处][15] 了解有关该主题的更多详细信息。如果一个 goroutine 碰巧执行了系统调用或调用 C 代码,当前线程会将移交给另一个队列中等待运行的其他 goroutine ,以便它们有更好的机会运行; 当前 goroutine 被暂停,等待系统调用或 C 函数返回。当 C 函数返回时,线程会尝试恢复暂停的 goroutine,但如果这不可能,它会要求 Go runtime 找到另一个线程来完成 goroutine 并进入睡眠状态。
|
||||
|
||||
考虑到这一点,让我们看看当一个 goroutine 被移动到一个新线程时,运行一些 Python 代码的 goroutine 会发生什么:
|
||||
|
||||
1. 我们的 goroutine 启动,执行 C 调用,暂停,GIL 被锁定。
|
||||
|
||||
2. 当 C 调用返回时,当前线程尝试恢复 goroutine,但失败。
|
||||
|
||||
3. 当前线程告诉 Go 运行时寻找另一个线程来恢复我们的 goroutine。
|
||||
|
||||
4. Go scheduler 找到可用线程并恢复 goroutine。
|
||||
|
||||
5. goroutine 快完成了,并在返回之前尝试解锁 GIL。The goroutine is almost done and tries to unlock the GIL before returning.
|
||||
|
||||
6. 当前状态中存储的线程ID来自原始线程,与当前线程的ID不同。
|
||||
|
||||
7. Panic!
|
||||
|
||||
所幸,我们可以通过从 goroutine 中调用运行时包中的 LockOSThread 函数来强制 Go runtime 始终保持我们的 goroutine 在同一线程上运行:
|
||||
|
||||
```
|
||||
go func() {
|
||||
runtime.LockOSThread()
|
||||
|
||||
_gstate := python.PyGILState_Ensure()
|
||||
odds.Call(python.PyTuple_New(0), python.PyDict_New())
|
||||
python.PyGILState_Release(_gstate)
|
||||
wg.Done()
|
||||
}()
|
||||
|
||||
```
|
||||
|
||||
这会干扰 scheduler 并可能引入一些开销,但这是我们也愿意付出代价。
|
||||
|
||||
### 结论
|
||||
|
||||
为了嵌入 Python,Datadog Agent 必须接受一些权衡:
|
||||
* cgo 引入的开销。
|
||||
|
||||
* 手动处理 GIL 的任务。
|
||||
|
||||
* 在执行期间将 goroutine 绑定到同一线程的限制。
|
||||
|
||||
为了能方便在 Go 中运行 Python 检查代码,我们乐此不疲。意识到权衡,我们能够最大限度地减少它们的影响,除了为支持 Python 而引入的其他限制,我们没有对策来控制潜在问题:
|
||||
|
||||
* 构建是自动化和可配置的,因此开发人员仍然需要拥有与 `go build` 非常相似的东西。
|
||||
|
||||
* agent 的轻量级版本,可以完全剥离 Python 支持,只需使用 Go 构建标签。A lightweight version of the agent can be built stripping out Python support entirely simply using Go build tags.
|
||||
|
||||
* 这样的版本仅依赖于在代理本身中硬编码的核心检查(主要是系统和网络检查),但没有 cgo 并且可以交叉编译。Such a version only relies on core checks hardcoded in the agent itself (system and network checks mostly) but is cgo free and can be cross compiled.
|
||||
|
||||
我们将在未来重新评估我们的选择,并决定是否仍然值得保留 cgo; 我们甚至可以重新考虑 Python 作为一个整体是否仍然值得,等待 [Go plugin package][16] 成熟到足以支持我们的用例。但就目前而言,嵌入式 Python 运行良好,从旧代理过渡到新代理再简单不过了。
|
||||
|
||||
你是一个喜欢混合不同编程语言的多语言者吗?您喜欢了解语言的内部工作原理以提高您的代码性能吗? [Join us at Datadog!][17]
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
|
||||
作者:[ Massimiliano Pippi][a]
|
||||
译者:[Zioyi](https://github.com/Zioyi)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://github.com/masci
|
||||
[1]:http://twitter.com/share?url=https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
[2]:http://www.reddit.com/submit?url=https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
[3]:https://www.linkedin.com/shareArticle?mini=true&url=https://www.datadoghq.com/blog/engineering/cgo-and-python/
|
||||
[4]:https://www.datadoghq.com/blog/category/under-the-hood
|
||||
[5]:https://www.datadoghq.com/blog/tag/agent
|
||||
[6]:https://www.datadoghq.com/blog/tag/golang
|
||||
[7]:https://www.datadoghq.com/blog/tag/python
|
||||
[8]:https://github.com/DataDog/datadog-agent/
|
||||
[9]:https://docs.python.org/2/extending/embedding.html
|
||||
[10]:https://dave.cheney.net/2016/01/18/cgo-is-not-go
|
||||
[11]:https://golang.org/cmd/cgo/
|
||||
[12]:https://en.wikipedia.org/wiki/Foreign_function_interface
|
||||
[13]:https://blog.golang.org/c-go-cgo
|
||||
[14]:https://github.com/sbinet/go-python
|
||||
[15]:https://morsmachine.dk/go-scheduler
|
||||
[16]:https://golang.org/pkg/plugin/
|
||||
[17]:https://www.datadoghq.com/careers/
|
Loading…
Reference in New Issue
Block a user