Merge pull request #29445 from wxy/20230328.3-️-Why-you-should-use-Python-and-Rust-together

ATRP:published/20230328.3 ️ Why you should use Python and Rust together.md
This commit is contained in:
Xingyu.Wang 2023-05-26 16:40:01 +08:00 committed by GitHub
commit fb1bdbea79
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 326 additions and 321 deletions

View File

@ -0,0 +1,326 @@
[#]: subject: "Why you should use Python and Rust together"
[#]: via: "https://opensource.com/article/23/3/python-loves-rust"
[#]: author: "Moshe Zadka https://opensource.com/users/moshez"
[#]: collector: "lkxed"
[#]: translator: "ChatGPT"
[#]: reviewer: "wxy"
[#]: publisher: "wxy"
[#]: url: "https://linux.cn/article-15846-1.html"
结合使用 Python 和 Rust
======
![][0]
> Rust 和 Python 的优势互补。可以使用 Python 进行原型设计,然后将性能瓶颈转移到 Rust 上。
Python 和 Rust 是非常不同的语言,但它们实际上非常搭配。但在讨论如何将 Python 与 Rust 结合之前,我想先介绍一下 Rust 本身。你可能已经听说了这种语言,但可能还没有了解过它的细节。
### 什么是 Rust
Rust 是一种低级语言,这意味着程序员所处理的东西接近于计算机的 “真实” 运行方式。
例如,整数类型由字节大小定义,与 CPU 支持的类型相对应。虽然我们很想简单地说 Rust 中的 `a+b` 对应于一条机器指令,但实际上并不完全是这样!
Rust 编译器链非常复杂。作为第一种近似的方法,将这样的语句视为 “有点” 真实是有用的。
Rust 旨在实现零成本抽象,这意味着许多语言级别可用的抽象在运行时环境中会被编译去掉。
例如,除非明确要求,对象会在堆栈上分配。结果是,在 Rust 中创建本地对象没有运行时成本(尽管可能需要进行初始化)。
最后Rust 是一种内存安全的语言。也有其他内存安全的语言和其他支持零成本抽象的语言。但通常这些是两类不同的语言。
内存安全并不意味着不可能在 Rust 中出现内存违规。它确实意味着只有两种方式可能导致内存违规:
- 编译器的错误。
- 显式声明为不安全(`unsafe`)的代码。
Rust 标准库代码有很多被标记为不安全的代码,虽然比许多人预期的少。这并不意味着该语句无意义。除了需要自己编写不安全代码的(罕见的)情况外,内存违规通常是由基础设施造成的。
### 为什么会有 Rust 出现?
为什么人们要创建 Rust是哪些问题没有被现有编程语言解决吗
Rust 被设计成既能高效运行,又保证内存安全。在现代的联网世界中,这是一个越来越重要的问题。
Rust 的典型应用场景是协议的低级解析。待解析的数据通常来自不受信任的来源,并且需要通过高效的方式进行解析。
如果你认为这听起来像 Web 浏览器所做的事情那不是巧合。Rust 最初起源于 Mozilla 基金会,它是为了改进 Firefox 浏览器而设计的。
如今,需要保证安全和速度的不仅仅是浏览器。即使是常见的微服务架构也必须能够快速解析不受信任的数据,同时保证安全。
### 现实示例:统计字符
为了理解 “封装 Rust” 的例子,需要解决一个问题。这个问题需要满足以下要求:
- 足够容易解决。
- 能够写高性能循环来优化。
- 有一定的现实意义。
这个玩具问题的例子是判断一个字符在一个字符串中是否出现超过了 X 次。这个问题不容易通过高效的正则表达式解决。即使是专门的 Numpy 代码也可能不够快,因为通常没有必要扫描整个字符串。
你可以想象一些 Python 库和技巧的组合来解决这个问题。然而,如果在低级别的语言中实现直接的算法,它会非常快,并且更易于阅读。
为了使问题稍微有趣一些,以演示 Rust 的一些有趣部分,这个问题增加了一些变化。该算法支持在换行符处重置计数(意即:字符是否在一行中出现了超过 X 次?)或在空格处重置计数(意即:字符是否在单词中出现了超过 X 次?)。
这是唯一与 “现实性” 相关的部分。过多的现实性将使这个示例在教育上不再有用。
#### 支持枚举
Rust 支持使用枚举(`enum`)。你可以使用枚举做很多有趣的事情。
目前,只使用了一个简单的三选一的枚举,并没有其他的变形。这个枚举编码了哪种字符重置计数。
```
#[derive(Copy)]
enum Reset {
NewlinesReset,
SpacesReset,
NoReset,
}
```
#### 支持结构
接下来的 Rust 组件更大一些:这是一个结构(`struct`。Rust 的结构与 Python 的 `dataclass` 有些相似。同样,你可以用结构做更复杂的事情。
```
#[pyclass]
struct Counter {
what: char,
min_number: u64,
reset: Reset,
}
```
#### 实现块
你可以在 Rust 中使用一个单独的块,称为实现(`impl`)块,为结构添加一个方法。但具体细节超出了本文的范围。
在这个示例中,该方法调用了一个外部函数。这主要是为了分解代码。更复杂的用例将指示 Rust 编译器内联该函数,以便在不产生任何运行时成本的情况下提高可读性。
```
#[pymethods]
impl Counter {
#[new]
fn new(what: char, min_number: u64, reset: Reset) -> Self {
Counter{what: what, min_number: min_number, reset: reset}
}
fn has_count(
&self,
data: &str,
) -> bool {
has_count(self, data.chars())
}
}
```
#### 函数
默认情况下Rust 变量是常量。由于当前的计数(`current_count`)必须更改,因此它被声明为可变变量。
```
fn has_count(cntr: &Counter, chars: std::str::Chars) -> bool {
let mut current_count : u64 = 0;
for c in chars {
if got_count(cntr, c, &mut current_count) {
return true;
}
}
false
}
```
该循环遍历字符并调用 `got_count` 函数。再次强调,这是为了将代码分解成幻灯片展示。它展示了如何向函数发送可变引用。
尽管 `current_count` 是可变的,但发送和接收站点都显式标记该引用为可变。这可以清楚地表明哪些函数可能修改一个值。
#### 计数
`got_count` 函数重置计数器将其递增然后检查它。Rust 的冒号分隔的表达式序列评估最后一个表达式的结果,即是否达到了指定的阈值。
```
fn got_count(cntr: &Counter, c: char, current_count: &mut u64) -> bool {
maybe_reset(cntr, c, current_count);
maybe_incr(cntr, c, current_count);
*current_count >= cntr.min_number
}
```
#### 重置代码
`reset` 的代码展示了 Rust 中另一个有用的功能:模式匹配。对 Rust 中匹配的完整描述需要一个学期级别的课程,不适合在一个无关的演讲中讲解。这个示例匹配了该元组的两个选项之一。
```
fn maybe_reset(cntr: &Counter, c: char, current_count: &mut u64) -> () {
match (c, cntr.reset) {
('\n', Reset::NewlinesReset) | (' ', Reset::SpacesReset)=> {
*current_count = 0;
}
_ => {}
};
}
```
#### 增量支持
增量将字符与所需字符进行比较,并在匹配时增加计数。
```
fn maybe_incr(cntr: &Counter, c: char, current_count: &mut u64) -> (){
if c == cntr.what {
*current_count += 1;
};
}
```
请注意,我在本文中优化了代码以适合幻灯片。这不一定是 Rust 代码的最佳实践示例,也不是如何设计良好的 API 的示例。
### 为 Python 封装 Rust 代码
为了将 Rust 代码封装到 Python 中,你可以使用 PyO3。PyO3 Rust “crate”即库允许内联提示将 Rust 代码包装为 Python使得修改两者更容易。
#### 包含 PyO3 crate 原语
首先,你必须包含 PyO3 crate 原语。
```
use pyo3::prelude::*;
```
#### 封装枚举
枚举需要被封装。`derive` 从句对于将枚举封装为 PyO3 是必需的,因为它们允许类被复制和克隆,使它们更容易在 Python 中使用。
```
#[pyclass]
#[derive(Clone)]
#[derive(Copy)]
enum Reset {
/* ... */
}
```
#### 封装结构
结构同样需要被封装。在 Rust 中,这些被称为 “宏”,它们会生成所需的接口位。
```
#[pyclass]
struct Counter {
/* ... */
}
```
#### 封装实现
封装实现(`impl`)更有趣。增加了另一个名为 `new` 的宏。此方法被标记为 `#[new]`,让 PyO3 知道如何为内置对象公开构造函数。
```
#[pymethods]
impl Counter {
#[new]
fn new(what: char, min_number: u64,
reset: Reset) -> Self {
Counter{what: what,
min_number: min_number, reset: reset}
}
/* ... */
}
```
#### 定义模块
最后,定义一个初始化模块的函数。此函数具有特定的签名,必须与模块同名,并用 `#[pymodule]` 修饰。
```
#[pymodule]
fn counter(_py: Python, m: &PyModule
) -> PyResult<()> {
m.add_class::<Counter>()?;
m.add_class::<Reset>()?;
Ok(())
}
```
`?` 显示此函数可能失败(例如,如果类没有正确配置)。 `PyResult` 在导入时转换为 Python 异常。
#### Maturin 开发
为了快速检查,用 `maturin develop` 构建并将库安装到当前虚拟环境中。这有助于快速迭代。
```
$ maturin develop
```
#### Maturin 构建
`maturin build` 命令构建一个 `manylinux` 轮子,它可以上传到 PyPI。轮子是特定于 CPU 架构的。
#### Python 库
从 Python 中使用库是最简单的部分。没有任何东西表明这与在 Python 中编写代码有什么区别。这其中的一个有用方面是,如果你优化了已经有单元测试的 Python 中的现有库,你可以使用 Python 单元测试来测试 Rust 库。
#### 导入
无论你是使用 `maturin develop` 还是 `pip install` 来安装它,导入库都是使用 `import` 完成的。
```
import counter
```
#### 构造函数
构造函数的定义正好使对象可以从 Python 构建。这并不总是如此。有时仅从更复杂的函数返回对象。
```
cntr = counter.Counter(
'c',
3,
counter.Reset.NewlinesReset,
)
```
#### 调用函数
最终的收益终于来了。检查这个字符串是否至少有三个 “c” 字符:
```
>>> cntr.has_count("hello-c-c-c-goodbye")
True
```
添加一个换行符会触发剩余操作,这里没有插入换行符的三个 “c” 字符:
```
>>> cntr.has_count("hello-c-c-\nc-goodbye")
False
```
### 使用 Rust 和 Python 很容易
我的目标是让你相信将 Rust 和 Python 结合起来很简单。我编写了一些“粘合剂”代码。Rust 和 Python 具有互补的优点和缺点。
Rust 非常适合高性能、安全的代码。Rust 具有陡峭的学习曲线,对于快速原型解决方案而言可能有些笨拙。
Python 很容易入手并支持非常紧密的迭代循环。Python 确实有一个“速度上限”。超过一定程度后,从 Python 中获得更好的性能就更难了。
将它们结合起来完美无缝。在 Python 中进行原型设计,并将性能瓶颈移至 Rust 中。
使用 Maturin你的开发和部署流程更容易进行。开发、构建并享受这一组合吧
--------------------------------------------------------------------------------
via: https://opensource.com/article/23/3/python-loves-rust
作者:[Moshe Zadka][a]
选题:[lkxed][b]
译者ChatGPT
校对:[wxy](https://github.com/wxy)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/moshez
[b]: https://github.com/lkxed/
[0]: https://img.linux.net.cn/data/attachment/album/202305/26/163350snf627fx05ynz71j.jpg

View File

@ -1,321 +0,0 @@
[#]: subject: "Why you should use Python and Rust together"
[#]: via: "https://opensource.com/article/23/3/python-loves-rust"
[#]: author: "Moshe Zadka https://opensource.com/users/moshez"
[#]: collector: "lkxed"
[#]: translator: " "
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
Why you should use Python and Rust together
======
Python and Rust are very different languages, but they actually go together rather well. But before discussing how to combine Python with Rust, I want to introduce Rust itself. You've likely heard of the language but may not have heard details about how it works.
### What is Rust?
Rust is a low-level language. This means that the things the programmers deal with are close to the way computers "really" work.
For example, integer types are defined by bit size and correspond to CPU-supported types. While it is tempting to say that this means `a+b` in Rust corresponds to one machine instruction, it does not mean quite that!
Rust's compiler's chain is non-trivial. It is useful as a first approximation to treat statements like that as "kind of" true.
Rust is designed for zero-cost abstraction, meaning many of the abstractions available at the language level are compiled away at runtime.
For example, objects are allocated on the stack unless explicitly asked for. The result is that creating a local object in Rust has no runtime cost (though initialization might).
Finally, Rust is a memory-safe language. There are other memory-safe languages and other zero-cost abstraction languages. Usually, those are different languages.
Memory safety does not mean it is impossible to have memory violations in Rust. It does mean that there are only two ways that memory violations can happen:
- A bug in the compiler.
- Code that's explicitly declared unsafe.
Rust standard library code has quite a bit of code that is marked unsafe, though less than what many assume. This does not make the statement vacuous though. With the (rare) exception of needing to write unsafe code yourself, memory violations result from the underlying infrastructure.
### Why does Rust exist?
Why did people create Rust? What problem was not addressed by existing languages?
Rust was designed as a language to achieve a combination of high-performance code that is memory safe. This concern is increasingly important in a networked world.
The quintessential use case for Rust is low-level parsing of protocols. The data to be parsed often comes from untrusted sources and may need to be parsed in a performant way.
If this sounds like what a web browser does, it is no coincidence. Rust originated from the Mozilla Foundation as a way to improve the Firefox browser.
In the modern world, browsers are no longer the only things on which there is pressure to be safe and fast. Even the common microservice architecture, combined with defense-in-depth principles, must be able to unpack untrusted data quickly.
### Counting characters
To understand a "wrapping in Rust" example, there needs to be a problem to solve. Not just any problem will do. The issue needs to be:
- Easy enough to solve
- Be helped by the ability to write high-performance loops
- Somewhat realistic
The toy problem in this case is whether a character appears more than X times in a string. This is something that's not easily amenable to performant regular expressions. Even dedicated Numpy code can be slower than necessary because often there is no need to scan the entire string.
You can imagine some combination of Python libraries and tricks that make this possible. However, the obvious algorithm is pretty fast if implemented in a low-level language and makes things more readable.
A twist is added to make the problem slightly more interesting and demonstrate some fun parts of Rust. The algorithm supports resetting the count on a newline (does the character appear more than X times in a line?) or on a space (does the character appear more than X times in a word?).
This is the only nod given to "realism." Any more realism will make the example not useful pedagogically.
#### Enum support
Rust supports enumeration (`enums`). You can do many interesting things with `enums`.
For now, a three-way `enum` without any other twirls is used. The `enum` encodes what character resets the count.
```
#[derive(Copy)]
enum Reset {
NewlinesReset,
SpacesReset,
NoReset,
}
```
#### Struct support
The next Rust component is a bit more substantial: a `struct`. A Rust `struct` is somewhat close to a Python `dataclass`. Again, you can do more sophisticated things with a `struct`.
```
#[pyclass]
struct Counter {
what: char,
min_number: u64,
reset: Reset,
}
```
#### Implementation blocks
You add a method to a `struct` in a separate block in Rust: the `impl` block. The details are outside the scope of this article.
In this example, the method calls an external function. This is mostly done to break up the code. A more sophisticated use would instruct the Rust compiler to inline the function to allow readability without any runtime cost.
```
#[pymethods]
impl Counter {
#[new]
fn new(what: char, min_number: u64, reset: Reset) -> Self {
Counter{what: what, min_number: min_number, reset: reset}
}
fn has_count(
&self,
data: &str,
) -> bool {
has_count(self, data.chars())
}
}
```
#### Function
By default, Rust variables are constant. Because the current count has to change, it is declared as a mutable variable.
```
fn has_count(cntr: &Counter, chars: std::str::Chars) -> bool {
let mut current_count : u64 = 0;
for c in chars {
if got_count(cntr, c, &mut current_count) {
return true;
}
}
false
}
```
The loop goes over the characters and calls the function `got_count`. Again, this is done to break the code into slides. It does show how to send a mutable borrow reference to a function.
Even though `current_count` is mutable, both the sending and receiving sites explicitly mark the reference as mutable. This makes it clear which functions might modify a value.
#### Counting
The `got_count` resets the counter, increments it, and then checks it. Rust's colon-separated sequence of expressions evaluates to the result of the last expression, in this case, whether the threshold was met.
```
fn got_count(cntr: &Counter, c: char, current_count: &mut u64) -> bool {
maybe_reset(cntr, c, current_count);
maybe_incr(cntr, c, current_count);
*current_count >= cntr.min_number
}
```
#### Reset code
The `reset` code shows another useful thing in Rust: matching. A complete description of the matching abilities in Rust would be a semester-level class, not two minutes in an unrelated talk, but this example matches on a tuple matching one of two options.
```
fn maybe_reset(cntr: &Counter, c: char, current_count: &mut u64) -> () {
match (c, cntr.reset) {
('\n', Reset::NewlinesReset) | (' ', Reset::SpacesReset)=> {
*current_count = 0;
}
_ => {}
};
}
```
#### Increment support
The increment compares the character to the desired one and, if matched, increments the count.
```
fn maybe_incr(cntr: &Counter, c: char, current_count: &mut u64) -> (){
if c == cntr.what {
*current_count += 1;
};
}
```
Note that I optimized the code in this article for slides. It is not necessarily a best-practice example of Rust code or how to design a good API.
### Wrap Rust code for Python
To wrap Rust code for Python, you can use PyO3. The PyO3 Rust "crate" (or library) allows inline hints for wrapping Rust code into Python, making it easier to modify both together.
#### Include PyO3 crate primitives
First, you must include the PyO3 crate primitives.
```
use pyo3::prelude::*;
```
#### Wrap enum
The `enum` needs to be wrapped. The `derive` clauses are necessary for wrapping the `enum` for PyO3, because they allow the class to be copied and cloned, making them easier to use from Python.
```
#[pyclass]
#[derive(Clone)]
#[derive(Copy)]
enum Reset {
/* ... */
}
```
#### Wrap struct
The `struct` is similarly wrapped. These call "macros" in Rust, which generate the needed interface bits.
```
#[pyclass]
struct Counter {
/* ... */
}
```
#### Wrap impl
Wrapping the `impl` is more interesting. Another macro is added called `new`. This method is marked as `#[new]`, letting PyO3 know how to expose a constructor for the built-in object.
```
#[pymethods]
impl Counter {
#[new]
fn new(what: char, min_number: u64,
reset: Reset) -> Self {
Counter{what: what,
min_number: min_number, reset: reset}
}
/* ... */
}
```
#### Define module
Finally, define a function that initializes the module. This function has a specific signature, must be named the same as the module, and decorated with `#[pymodule]`.
```
#[pymodule]
fn counter(_py: Python, m: &PyModule
) -> PyResult<()> {
m.add_class::<Counter>()?;
m.add_class::<Reset>()?;
Ok(())
}
```
The `?` shows that this function can fail (for example, if the class was not appropriately configured). The `PyResult` is translated into a Python exception at import time.
#### Maturin develop
For quick checking, `maturin develop` builds and installs the library into the current virtual environment. This helps iterate quickly.
```
$ maturin develop
```
#### Maturin build
The `maturin build` command builds a `manylinux` wheel, which can be uploaded to PyPI. The wheel is specific to the CPU architecture.
#### Python library
Using the library from Python is the nice part. Nothing indicates a difference between this and writing the code in Python. One useful aspect of this is that if you optimize an existing library in Python that already has unit tests, you can use the Python unit tests for the Rust library.
#### Import
Whether you used `maturin develop` or `pip install` to install it, importing the library is done with `import`.
```
import counter
```
#### Construct
The constructor was defined exactly so the object could be built from Python. This is not always the case. Sometimes objects are only returned from more sophisticated functions.
```
cntr = counter.Counter(
'c',
3,
counter.Reset.NewlinesReset,
)
```
#### Call
The final pay-off is here at last. Check whether this string has at least three "c" characters:
```
>>> cntr.has_count("hello-c-c-c-goodbye")
True
```
Adding a newline causes the rest to happen, and there aren't three "c" characters without an intervening newline:
```
>>> cntr.has_count("hello-c-c-\nc-goodbye")
False
```
### Using Rust and Python is easy
My goal is to convince you that combining Rust and Python is easy. I wrote little code to "glue" them. Rust and Python have complementary strengths and weaknesses.
Rust is great for high-performance, safe code. Rust has a steep learning curve and can be awkward for quickly prototyping a solution.
Python is easy to get started with and supports incredibly tight iteration loops. Python does have a "speed cap." Beyond a certain level it is harder to get better performance from Python.
Combining them is perfect. Prototype in Python and move performance bottlenecks to Rust.
With `maturin`, your development and deployment pipelines are easier to make. Develop, build, and enjoy the combo!
--------------------------------------------------------------------------------
via: https://opensource.com/article/23/3/python-loves-rust
作者:[Moshe Zadka][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/moshez
[b]: https://github.com/lkxed/