mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-22 23:00:57 +08:00
322 lines
12 KiB
Markdown
322 lines
12 KiB
Markdown
|
[#]: subject: "Why you should use Python and Rust together"
|
||
|
[#]: via: "https://opensource.com/article/23/3/python-loves-rust"
|
||
|
[#]: author: "Moshe Zadka https://opensource.com/users/moshez"
|
||
|
[#]: collector: "lkxed"
|
||
|
[#]: translator: " "
|
||
|
[#]: reviewer: " "
|
||
|
[#]: publisher: " "
|
||
|
[#]: url: " "
|
||
|
|
||
|
Why you should use Python and Rust together
|
||
|
======
|
||
|
|
||
|
Python and Rust are very different languages, but they actually go together rather well. But before discussing how to combine Python with Rust, I want to introduce Rust itself. You've likely heard of the language but may not have heard details about how it works.
|
||
|
|
||
|
### What is Rust?
|
||
|
|
||
|
Rust is a low-level language. This means that the things the programmers deal with are close to the way computers "really" work.
|
||
|
|
||
|
For example, integer types are defined by bit size and correspond to CPU-supported types. While it is tempting to say that this means `a+b` in Rust corresponds to one machine instruction, it does not mean quite that!
|
||
|
|
||
|
Rust's compiler's chain is non-trivial. It is useful as a first approximation to treat statements like that as "kind of" true.
|
||
|
|
||
|
Rust is designed for zero-cost abstraction, meaning many of the abstractions available at the language level are compiled away at runtime.
|
||
|
|
||
|
For example, objects are allocated on the stack unless explicitly asked for. The result is that creating a local object in Rust has no runtime cost (though initialization might).
|
||
|
|
||
|
Finally, Rust is a memory-safe language. There are other memory-safe languages and other zero-cost abstraction languages. Usually, those are different languages.
|
||
|
|
||
|
Memory safety does not mean it is impossible to have memory violations in Rust. It does mean that there are only two ways that memory violations can happen:
|
||
|
|
||
|
- A bug in the compiler.
|
||
|
- Code that's explicitly declared unsafe.
|
||
|
|
||
|
Rust standard library code has quite a bit of code that is marked unsafe, though less than what many assume. This does not make the statement vacuous though. With the (rare) exception of needing to write unsafe code yourself, memory violations result from the underlying infrastructure.
|
||
|
|
||
|
### Why does Rust exist?
|
||
|
|
||
|
Why did people create Rust? What problem was not addressed by existing languages?
|
||
|
|
||
|
Rust was designed as a language to achieve a combination of high-performance code that is memory safe. This concern is increasingly important in a networked world.
|
||
|
|
||
|
The quintessential use case for Rust is low-level parsing of protocols. The data to be parsed often comes from untrusted sources and may need to be parsed in a performant way.
|
||
|
|
||
|
If this sounds like what a web browser does, it is no coincidence. Rust originated from the Mozilla Foundation as a way to improve the Firefox browser.
|
||
|
|
||
|
In the modern world, browsers are no longer the only things on which there is pressure to be safe and fast. Even the common microservice architecture, combined with defense-in-depth principles, must be able to unpack untrusted data quickly.
|
||
|
|
||
|
### Counting characters
|
||
|
|
||
|
To understand a "wrapping in Rust" example, there needs to be a problem to solve. Not just any problem will do. The issue needs to be:
|
||
|
|
||
|
- Easy enough to solve
|
||
|
- Be helped by the ability to write high-performance loops
|
||
|
- Somewhat realistic
|
||
|
|
||
|
The toy problem in this case is whether a character appears more than X times in a string. This is something that's not easily amenable to performant regular expressions. Even dedicated Numpy code can be slower than necessary because often there is no need to scan the entire string.
|
||
|
|
||
|
You can imagine some combination of Python libraries and tricks that make this possible. However, the obvious algorithm is pretty fast if implemented in a low-level language and makes things more readable.
|
||
|
|
||
|
A twist is added to make the problem slightly more interesting and demonstrate some fun parts of Rust. The algorithm supports resetting the count on a newline (does the character appear more than X times in a line?) or on a space (does the character appear more than X times in a word?).
|
||
|
|
||
|
This is the only nod given to "realism." Any more realism will make the example not useful pedagogically.
|
||
|
|
||
|
#### Enum support
|
||
|
|
||
|
Rust supports enumeration (`enums`). You can do many interesting things with `enums`.
|
||
|
|
||
|
For now, a three-way `enum` without any other twirls is used. The `enum` encodes what character resets the count.
|
||
|
|
||
|
```
|
||
|
#[derive(Copy)]
|
||
|
enum Reset {
|
||
|
NewlinesReset,
|
||
|
SpacesReset,
|
||
|
NoReset,
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Struct support
|
||
|
|
||
|
The next Rust component is a bit more substantial: a `struct`. A Rust `struct` is somewhat close to a Python `dataclass`. Again, you can do more sophisticated things with a `struct`.
|
||
|
|
||
|
```
|
||
|
#[pyclass]
|
||
|
struct Counter {
|
||
|
what: char,
|
||
|
min_number: u64,
|
||
|
reset: Reset,
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Implementation blocks
|
||
|
|
||
|
You add a method to a `struct` in a separate block in Rust: the `impl` block. The details are outside the scope of this article.
|
||
|
|
||
|
In this example, the method calls an external function. This is mostly done to break up the code. A more sophisticated use would instruct the Rust compiler to inline the function to allow readability without any runtime cost.
|
||
|
|
||
|
```
|
||
|
#[pymethods]
|
||
|
impl Counter {
|
||
|
#[new]
|
||
|
fn new(what: char, min_number: u64, reset: Reset) -> Self {
|
||
|
Counter{what: what, min_number: min_number, reset: reset}
|
||
|
}
|
||
|
|
||
|
fn has_count(
|
||
|
&self,
|
||
|
data: &str,
|
||
|
) -> bool {
|
||
|
has_count(self, data.chars())
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Function
|
||
|
|
||
|
By default, Rust variables are constant. Because the current count has to change, it is declared as a mutable variable.
|
||
|
|
||
|
```
|
||
|
fn has_count(cntr: &Counter, chars: std::str::Chars) -> bool {
|
||
|
let mut current_count : u64 = 0;
|
||
|
for c in chars {
|
||
|
if got_count(cntr, c, &mut current_count) {
|
||
|
return true;
|
||
|
}
|
||
|
}
|
||
|
false
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The loop goes over the characters and calls the function `got_count`. Again, this is done to break the code into slides. It does show how to send a mutable borrow reference to a function.
|
||
|
|
||
|
Even though `current_count` is mutable, both the sending and receiving sites explicitly mark the reference as mutable. This makes it clear which functions might modify a value.
|
||
|
|
||
|
#### Counting
|
||
|
|
||
|
The `got_count` resets the counter, increments it, and then checks it. Rust's colon-separated sequence of expressions evaluates to the result of the last expression, in this case, whether the threshold was met.
|
||
|
|
||
|
```
|
||
|
fn got_count(cntr: &Counter, c: char, current_count: &mut u64) -> bool {
|
||
|
maybe_reset(cntr, c, current_count);
|
||
|
maybe_incr(cntr, c, current_count);
|
||
|
*current_count >= cntr.min_number
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Reset code
|
||
|
|
||
|
The `reset` code shows another useful thing in Rust: matching. A complete description of the matching abilities in Rust would be a semester-level class, not two minutes in an unrelated talk, but this example matches on a tuple matching one of two options.
|
||
|
|
||
|
```
|
||
|
fn maybe_reset(cntr: &Counter, c: char, current_count: &mut u64) -> () {
|
||
|
match (c, cntr.reset) {
|
||
|
('\n', Reset::NewlinesReset) | (' ', Reset::SpacesReset)=> {
|
||
|
*current_count = 0;
|
||
|
}
|
||
|
_ => {}
|
||
|
};
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Increment support
|
||
|
|
||
|
The increment compares the character to the desired one and, if matched, increments the count.
|
||
|
|
||
|
```
|
||
|
fn maybe_incr(cntr: &Counter, c: char, current_count: &mut u64) -> (){
|
||
|
if c == cntr.what {
|
||
|
*current_count += 1;
|
||
|
};
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Note that I optimized the code in this article for slides. It is not necessarily a best-practice example of Rust code or how to design a good API.
|
||
|
|
||
|
### Wrap Rust code for Python
|
||
|
|
||
|
To wrap Rust code for Python, you can use PyO3. The PyO3 Rust "crate" (or library) allows inline hints for wrapping Rust code into Python, making it easier to modify both together.
|
||
|
|
||
|
#### Include PyO3 crate primitives
|
||
|
|
||
|
First, you must include the PyO3 crate primitives.
|
||
|
|
||
|
```
|
||
|
use pyo3::prelude::*;
|
||
|
```
|
||
|
|
||
|
#### Wrap enum
|
||
|
|
||
|
The `enum` needs to be wrapped. The `derive` clauses are necessary for wrapping the `enum` for PyO3, because they allow the class to be copied and cloned, making them easier to use from Python.
|
||
|
|
||
|
```
|
||
|
#[pyclass]
|
||
|
#[derive(Clone)]
|
||
|
#[derive(Copy)]
|
||
|
enum Reset {
|
||
|
/* ... */
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Wrap struct
|
||
|
|
||
|
The `struct` is similarly wrapped. These call "macros" in Rust, which generate the needed interface bits.
|
||
|
|
||
|
```
|
||
|
#[pyclass]
|
||
|
struct Counter {
|
||
|
/* ... */
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Wrap impl
|
||
|
|
||
|
Wrapping the `impl` is more interesting. Another macro is added called `new`. This method is marked as `#[new]`, letting PyO3 know how to expose a constructor for the built-in object.
|
||
|
|
||
|
```
|
||
|
#[pymethods]
|
||
|
impl Counter {
|
||
|
#[new]
|
||
|
fn new(what: char, min_number: u64,
|
||
|
reset: Reset) -> Self {
|
||
|
Counter{what: what,
|
||
|
min_number: min_number, reset: reset}
|
||
|
}
|
||
|
/* ... */
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Define module
|
||
|
|
||
|
Finally, define a function that initializes the module. This function has a specific signature, must be named the same as the module, and decorated with `#[pymodule]`.
|
||
|
|
||
|
```
|
||
|
#[pymodule]
|
||
|
fn counter(_py: Python, m: &PyModule
|
||
|
) -> PyResult<()> {
|
||
|
m.add_class::<Counter>()?;
|
||
|
m.add_class::<Reset>()?;
|
||
|
Ok(())
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The `?` shows that this function can fail (for example, if the class was not appropriately configured). The `PyResult` is translated into a Python exception at import time.
|
||
|
|
||
|
#### Maturin develop
|
||
|
|
||
|
For quick checking, `maturin develop` builds and installs the library into the current virtual environment. This helps iterate quickly.
|
||
|
|
||
|
```
|
||
|
$ maturin develop
|
||
|
```
|
||
|
|
||
|
#### Maturin build
|
||
|
|
||
|
The `maturin build` command builds a `manylinux` wheel, which can be uploaded to PyPI. The wheel is specific to the CPU architecture.
|
||
|
|
||
|
#### Python library
|
||
|
|
||
|
Using the library from Python is the nice part. Nothing indicates a difference between this and writing the code in Python. One useful aspect of this is that if you optimize an existing library in Python that already has unit tests, you can use the Python unit tests for the Rust library.
|
||
|
|
||
|
#### Import
|
||
|
|
||
|
Whether you used `maturin develop` or `pip install` to install it, importing the library is done with `import`.
|
||
|
|
||
|
```
|
||
|
import counter
|
||
|
```
|
||
|
|
||
|
#### Construct
|
||
|
|
||
|
The constructor was defined exactly so the object could be built from Python. This is not always the case. Sometimes objects are only returned from more sophisticated functions.
|
||
|
|
||
|
```
|
||
|
cntr = counter.Counter(
|
||
|
'c',
|
||
|
3,
|
||
|
counter.Reset.NewlinesReset,
|
||
|
)
|
||
|
```
|
||
|
|
||
|
#### Call
|
||
|
|
||
|
The final pay-off is here at last. Check whether this string has at least three "c" characters:
|
||
|
|
||
|
```
|
||
|
>>> cntr.has_count("hello-c-c-c-goodbye")
|
||
|
True
|
||
|
```
|
||
|
|
||
|
Adding a newline causes the rest to happen, and there aren't three "c" characters without an intervening newline:
|
||
|
|
||
|
```
|
||
|
>>> cntr.has_count("hello-c-c-\nc-goodbye")
|
||
|
False
|
||
|
```
|
||
|
|
||
|
### Using Rust and Python is easy
|
||
|
|
||
|
My goal is to convince you that combining Rust and Python is easy. I wrote little code to "glue" them. Rust and Python have complementary strengths and weaknesses.
|
||
|
|
||
|
Rust is great for high-performance, safe code. Rust has a steep learning curve and can be awkward for quickly prototyping a solution.
|
||
|
|
||
|
Python is easy to get started with and supports incredibly tight iteration loops. Python does have a "speed cap." Beyond a certain level it is harder to get better performance from Python.
|
||
|
|
||
|
Combining them is perfect. Prototype in Python and move performance bottlenecks to Rust.
|
||
|
|
||
|
With `maturin`, your development and deployment pipelines are easier to make. Develop, build, and enjoy the combo!
|
||
|
|
||
|
--------------------------------------------------------------------------------
|
||
|
|
||
|
via: https://opensource.com/article/23/3/python-loves-rust
|
||
|
|
||
|
作者:[Moshe Zadka][a]
|
||
|
选题:[lkxed][b]
|
||
|
译者:[译者ID](https://github.com/译者ID)
|
||
|
校对:[校对者ID](https://github.com/校对者ID)
|
||
|
|
||
|
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||
|
|
||
|
[a]: https://opensource.com/users/moshez
|
||
|
[b]: https://github.com/lkxed/
|