TranslateProject/translated/tech/20200117 C vs. Rust- Which to choose for programming hardware abstractions.md
Xingyu Wang 2bc97881fa PART 3
2020-01-19 22:44:12 +08:00

500 lines
23 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[#]: collector: (lujun9972)
[#]: translator: (wxy)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (C vs. Rust: Which to choose for programming hardware abstractions)
[#]: via: (https://opensource.com/article/20/1/c-vs-rust-abstractions)
[#]: author: (Dan Pittman https://opensource.com/users/dan-pittman)
C 还是 Rust选择哪个用于编程硬件抽象
======
> 在 Rust 中使用类型级编程可以使硬件抽象更加安全。
![Tools illustration][1]
Rust 是一种日益流行的编程语言,被视为硬件接口的最佳选择。通常会将其与 C 的抽象级别进行比较。本文介绍了 Rust 如何以多种方式处理按位运算,并提供了既安全又易于使用的解决方案。
语言 | 源自 | 官方说明 | 总览
---|---|---|---
C | 1972 年 | C 是一种通用编程语言,具有表达式简约、现代的控制流和数据结构,以及丰富的运算符集等特点。(来源:[CS 基础知识] [2]| C 是(一种)命令式语言,旨在以相对简单的方式进行编译,从而提供对内存的低级访问。(来源:[W3schools.in] [3]
Rust | 2010 年 | 一种使所有人都能构建可靠、高效的软件的语言(来源:[Rust 网站] [4]| Rust 是一种专注于安全性(尤其是安全并发性)的多范式系统编程语言。(来源:[维基百科] [5]
### 在 C 语言中对寄存器值进行按位运算
在系统编程领域,你可能经常需要编写硬件驱动程序或直接与内存映射的设备进行交互,而这些交互几乎总是通过硬件提供的内存映射的寄存器来完成的。通常,你通过对某些固定宽度的数字类型进行按位运算来与这些寄存器进行交互。
例如,假设一个具有三个字段的 8 位寄存器:
```
+----------+------+-----------+---------+
| (unused) | Kind | Interrupt | Enabled |
+----------+------+-----------+---------+
   5-7       2-4        1          0
```
字段名称下方的数字规定了该字段在寄存器中使用的位。要启用该寄存器,你将写入值 `1`(以二进制表示为`0000_0001`)来设置 `Enabled` 字段的位。但是,通常情况下,你也不想干扰寄存器中的现有配置。假设你要在设备上启用中断功能,但也要确保设备保持启用状态。为此,必须将 `Interrupt` 字段的值与 `Enabled` 字段的值结合起来。你可以通过按位操作来做到这一点:
```
1 | (1 << 1)
```
通过将 1 和 2左移 `1` 一位得到)进行“或”运算得到二进制值 `0000_0011` 。你可以将其写入寄存器,使其保持启用状态,但也允许中断。
有很多事情要记住,特别是当你要为一个完整的系统处理可能有数百个之多的寄存器时。实际上,你可以使用助记符来执行此操作,助记符可跟踪字段在寄存器中的位置以及字段的宽度(即它的上边界是什么?)
这是这些助记符之一的示例。它们是 C 语言的宏,用右侧的代码替换它们的出现的地方。这是上面列出的寄存器的简写。`` 的左侧是该字段的位置,而右侧则限制该字段的位:
```
#define REG_ENABLED_FIELD(x) (x << 0) & 1
#define REG_INTERRUPT_FIELD(x) (x << 1) & 2
#define REG_KIND_FIELD(x) (x << 2) & (7 << 2)
```
然后,你将使用这些通过类似以下方式来抽象化寄存器值的操作:
```
void set_reg_val(reg* u8, val u8);
fn enable_reg_with_interrupt(reg* u8) {
    set_reg_val(reg, REG_ENABLED_FIELD(1) | REG_INTERRUPT_FIELD(1));
}
```
这就是现在的做法。实际上,这就是大多数驱动程序出现在 Linux 内核中的方式。
有没有更好的办法?如果能够基于对现代编程语言研究得出新的类型系统,就可能能够获得安全性和可表达性的好处。也就是说,如何使用更丰富、更具表现力的类型系统来使此过程更安全、更持久?
### 在 Rust 语言中对寄存器值进行按位运算
继续用上面的寄存器作为例子:
```
+----------+------+-----------+---------+
| (unused) | Kind | Interrupt | Enabled |
+----------+------+-----------+---------+
   5-7       2-4        1          0
```
你可能想如何用 Rust 类型来表示它?
你将以类似的方式开始,为每个字段的*偏移*定义常量(即,距最低有效位有多远)及其掩码。*掩码*是一个值,其二进制表示形式可用于更新或读取寄存器内部的字段:
```
const ENABLED_MASK: u8 = 1;
const ENABLED_OFFSET: u8 = 0;
const INTERRUPT_MASK: u8 = 2;
const INTERRUPT_OFFSET: u8 = 1;
const KIND_MASK: u8 = 7 << 2;
const KIND_OFFSET: u8 = 2;
```
接下来,你将声明一个 `Field` 类型,并进行操作以将给定值转换为与其位置相关的值以供在寄存器内使用:
```
struct Field {
value: u8,
}
impl Field {
fn new(mask: u8, offset: u8, val: u8) -> Self {
Field {
value: (val << offset) & mask,
}
}
}
```
最后,你将使用一个 `Register` 类型,该类型会封装一个与你的寄存器宽度匹配的数字类型。 `Register` 具有 `update` 函数,可使用给定字段来更新寄存器:
```
struct Register(u8);
impl Register {
fn update(&mut self, val: Field) {
self.0 = self.0 | field.value;
}
}
fn enable_register(&mut reg) {
reg.update(Field::new(ENABLED_MASK, ENABLED_OFFSET, 1));
}
```
使用 Rust你可以使用数据结构来表示字段将它们附加到特定的寄存器并在与硬件交互时提供简洁明了的人机工程学。这个例子使用了 Rust 提供的最基本的功能。无论如何,添加的结构都会减轻上述 C 示例中的某些密度。现在,字段是个已命名的事物,而不是从模糊的按位运算符派生而来的数字,并且寄存器是具有状态的类型 —— 这在硬件上多了一层抽象。
### 一个易用的 Rust 实现
用 Rust 重写的第一个版本很好,但是并不理想。你必须记住要带上掩码和偏移量,并且要手工进行临时计算,这容易出错。人类不擅长精确且重复的任务 —— 我们往往会感到疲劳或失去专注力,这会导致错误。一次一个寄存器地手动记录掩码和偏移量几乎可以肯定会很糟糕。这是最好留给机器的任务。
其次,从结构上进行思考:如果有一种方法可以让字段的类型携带掩码和偏移信息呢?如果你要在访问硬件寄存器并与之交互的实现过程中就能发现错误,而不是在运行时才发现,该怎么办?也许你可以依靠一种通常用于在编译时解决问题的策略,例如类型。
你可以使用 [typenum][6] 来修改前面的示例,该库在类型级别提供数字和算术。在这里,你将使用掩码和偏移量对 `Field` 类型进行参数化,使其可用于任何 `Field` 实例,而不必在调用站点中将其包括在内:
```
#[macro_use]
extern crate typenum;
use core::marker::PhantomData;
use typenum::*;
// Now we'll add Mask and Offset to Field's type
struct Field<Mask: Unsigned, Offset: Unsigned> {
value: u8,
_mask: PhantomData<Mask>,
_offset: PhantomData<Offset>,
}
// We can use type aliases to give meaningful names to
// our fields (and not have to remember their offsets and masks).
type RegEnabled = Field<U1, U0>;
type RegInterrupt = Field<U2, U1>;
type RegKind = Field<op!(U7 << U2), U2>;
```
现在,当重新访问 `Field` 的构造函数时,你可以忽略掩码和偏移量参数,因为类型中包含该信息:
```
impl<Mask: Unsigned, Offset: Unsigned> Field<Mask, Offset> {
fn new(val: u8) -> Self {
Field {
value: (val << Offset::U8) & Mask::U8,
_mask: PhantomData,
_offset: PhantomData,
}
}
}
// And to enable our register...
fn enable_register(&mut reg) {
reg.update(RegEnabled::new(1));
}
```
看起来不错,但是……如果你对给定的值是否*适合*某个字段犯了错误,会发生什么?考虑一个简单的输入错误,你在其中放置了 `10` 而不是 `1`
```
fn enable_register(&mut reg) {
    reg.update(RegEnabled::new(10));
}
```
在上面的代码中,预期结果是什么?好吧,代码会将启用位设置为 0因为 `101 = 0`。那真不幸;最好在尝试写入之前知道你要写入字段的值是否适合该字段。事实上,我会考虑放弃错误字段值的高位*未定义行为*(喘气)。
### Using Rust with safety in mind
How can you check that a field's value fits in its prescribed position in a general way? More type-level numbers!
You can add a `Width` parameter to `Field` and use it to verify that a given value can fit into the field:
```
struct Field&lt;Width: Unsigned, Mask: Unsigned, Offset: Unsigned&gt; {
    value: u8,
    _mask: PhantomData&lt;Mask&gt;,
    _offset: PhantomData&lt;Offset&gt;,
    _width: PhantomData&lt;Width&gt;,
}
type RegEnabled = Field&lt;U1,U1, U0&gt;;
type RegInterrupt = Field&lt;U1, U2, U1&gt;;
type RegKind = Field&lt;U3, op!(U7 &lt;&lt; U2), U2&gt;;
impl&lt;Width: Unsigned, Mask: Unsigned, Offset: Unsigned&gt; Field&lt;Width, Mask, Offset&gt; {
    fn new(val: u8) -&gt; Option&lt;Self&gt; {
        if val &lt;= (1 &lt;&lt; Width::U8) - 1 {
            Some(Field {
                value: (val &lt;&lt; Offset::U8) &amp; Mask::U8,
                _mask: PhantomData,
                _offset: PhantomData,
                _width: PhantomData,
            })
        } else {
            None
        }
    }
}
```
Now you can construct a `Field` only if the given value fits! Otherwise, you have `None`, which signals that an error has occurred, rather than lopping off the high bits of the value and silently writing an unexpected value.
Note, though, this will raise an error at runtime. However, we knew the value we wanted to write beforehand, remember? Given that, we can teach the compiler to reject entirely a program which has an invalid field value—we dont have to wait until we run it!
This time, you'll add a _trait bound_ (the `where` clause) to a new realization of new, called `new_checked`, that asks the incoming value to be less than or equal to the maximum possible value a field with the given `Width` can hold:
```
struct Field&lt;Width: Unsigned, Mask: Unsigned, Offset: Unsigned&gt; {
    value: u8,
    _mask: PhantomData&lt;Mask&gt;,
    _offset: PhantomData&lt;Offset&gt;,
    _width: PhantomData&lt;Width&gt;,
}
type RegEnabled = Field&lt;U1, U1, U0&gt;;
type RegInterrupt = Field&lt;U1, U2, U1&gt;;
type RegKind = Field&lt;U3, op!(U7 &lt;&lt; U2), U2&gt;;
impl&lt;Width: Unsigned, Mask: Unsigned, Offset: Unsigned&gt; Field&lt;Width, Mask, Offset&gt; {
    const fn new_checked&lt;V: Unsigned&gt;() -&gt; Self
    where
        V: IsLessOrEqual&lt;op!((U1 &lt;&lt; Width) - U1), Output = True&gt;,
    {
        Field {
            value: (V::U8 &lt;&lt; Offset::U8) &amp; Mask::U8,
            _mask: PhantomData,
            _offset: PhantomData,
            _width: PhantomData,
        }
    }
}
```
Only numbers for which this property holds has an implementation of this trait, so if you use a number that does not fit, it will fail to compile. Take a look!
```
fn enable_register(&amp;mut reg) {
    reg.update(RegEnabled::new_checked::&lt;U10&gt;());
}
12 |     reg.update(RegEnabled::new_checked::&lt;U10&gt;());
   |                           ^^^^^^^^^^^^^^^^ expected struct `typenum::B0`, found struct `typenum::B1`
   |
   = note: expected type `typenum::B0`
           found type `typenum::B1`
```
`new_checked` will fail to produce a program that has an errant too-high value for a field. Your typo won't blow up at runtime because you could never have gotten an artifact to run.
You're nearing Peak Rust in terms of how safe you can make memory-mapped hardware interactions. However, what you wrote back in the first example in C was far more succinct than the type parameter salad you ended up with. Is doing such a thing even tractable when you're talking about potentially hundreds or even thousands of registers?
### Just right with Rust: both safe and accessible
Earlier, I called out calculating masks by hand as being problematic, but I just did that same problematic thing—albeit at the type level. While using such an approach is nice, getting to the point when you can write any code requires quite a bit of boilerplate and manual transcription (I'm talking about the type synonyms here).
Our team wanted something like the [TockOS mmio registers][7], but one that would generate typesafe implementations with the least amount of manual transcription possible. The result we came up with is a macro that generates the necessary boilerplate to get a Tock-like API plus type-based bounds checking. To use it, write down some information about a register, its fields, their width and offsets, and optional [enum][8]-like values (should you want to give "meaning" to the possible values a field can have):
```
register! {
    // The register's name
    Status,
    // The type which represents the whole register.
    u8,
    // The register's mode, ReadOnly, ReadWrite, or WriteOnly.
    RW,
    // And the fields in this register.
    Fields [
        On    WIDTH(U1) OFFSET(U0),
        Dead  WIDTH(U1) OFFSET(U1),
        Color WIDTH(U3) OFFSET(U2) [
            Red    = U1,
            Blue   = U2,
            Green  = U3,
            Yellow = U4
        ]
    ]
}
```
From this, you can generate register and field types like the previous example where the indices—the `Width`, `Mask`, and `Offset`—are derived from the values input in the `WIDTH` and `OFFSET` sections of a field's definition. Also, notice that all of these numbers are `typenums`; they're going to go directly into your `Field` definitions!
The generated code provides namespaces for registers and their associated fields through the name given for the register and the fields. That's a mouthful; here's what it looks like:
```
mod Status {
    struct Register(u8);
    mod On {
        struct Field; // There is of course more to this definition
    }
    mod Dead {
        struct Field;
    }
    mod Color {
        struct Field;
        pub const Red: Field = Field::&lt;U1&gt;new();
        // &amp;c.
    }
}
```
The generated API contains the nominally expected read and write primitives to get at the raw register value, but it also has ways to get a single field's value, do collective actions, and find out if any (or all) of a collection of bits is set. You can read the documentation on the [complete generated API][9].
### Kicking the tires
What does it look like to use these definitions for a real device? Will the code be littered with type parameters, obscuring any real logic from view?
No! By using type synonyms and type inference, you effectively never have to think about the type-level part of the program at all. You get to interact with the hardware in a straightforward way and get those bounds-related assurances automatically.
Here's an example of a [UART][10] register block. I'll skip the declaration of the registers themselves, as that would be too much to include here. Instead, it starts with a register "block" then helps the compiler know how to look up the registers from a pointer to the head of the block. We do that by implementing `Deref` and `DerefMut`:
```
#[repr(C)]
pub struct UartBlock {
    rx: UartRX::Register,
    _padding1: [u32; 15],
    tx: UartTX::Register,
    _padding2: [u32; 15],
    control1: UartControl1::Register,
}
pub struct Regs {
    addr: usize,
}
impl Deref for Regs {
    type Target = UartBlock;
    fn deref(&amp;self) -&gt; &amp;UartBlock {
        unsafe { &amp;*(self.addr as *const UartBlock) }
    }
}
impl DerefMut for Regs {
    fn deref_mut(&amp;mut self) -&gt; &amp;mut UartBlock {
        unsafe { &amp;mut *(self.addr as *mut UartBlock) }
    }
}
```
Once this is in place, using these registers is as simple as `read()` and `modify()`:
```
fn main() {
    // A pretend register block.
    let mut x = [0_u32; 33];
    let mut regs = Regs {
        // Some shenanigans to get at `x` as though it were a
        // pointer. Normally you'd be given some address like
        // `0xDEADBEEF` over which you'd instantiate a `Regs`.
        addr: &amp;mut x as *mut [u32; 33] as usize,
    };
    assert_eq!(regs.rx.read(), 0);
    regs.control1
        .modify(UartControl1::Enable::Set + UartControl1::RecvReadyInterrupt::Set);
    // The first bit and the 10th bit should be set.
    assert_eq!(regs.control1.read(), 0b_10_0000_0001);
}
```
When we're working with runtime values we use `Option` like we saw earlier. Here I'm using `unwrap`, but in a real program with unknown inputs, you'd probably want to check that you got a `Some` back from that new call:[1][11],[2][12]
```
fn main() {
    // A pretend register block.
    let mut x = [0_u32; 33];
    let mut regs = Regs {
        // Some shenanigans to get at `x` as though it were a
        // pointer. Normally you'd be given some address like
        // `0xDEADBEEF` over which you'd instantiate a `Regs`.
        addr: &amp;mut x as *mut [u32; 33] as usize,
    };
    let input = regs.rx.get_field(UartRX::Data::Field::Read).unwrap();
    regs.tx.modify(UartTX::Data::Field::new(input).unwrap());
}
```
### Decoding failure conditions
Depending on your personal pain threshold, you may have noticed that the errors are nearly unintelligible. Take a look at a not-so-subtle reminder of what I'm talking about:
```
error[E0271]: type mismatch resolving `&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UTerm, typenum::B1&gt;, typenum::B0&gt;, typenum::B1&gt;, typenum::B0&gt;, typenum::B0&gt; as typenum::IsLessOrEqual&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UInt&lt;typenum::UTerm, typenum::B1&gt;, typenum::B0&gt;, typenum::B1&gt;, typenum::B0&gt;&gt;&gt;::Output == typenum::B1`
  --&gt; src/main.rs:12:5
   |
12 |     less_than_ten::&lt;U20&gt;();
   |     ^^^^^^^^^^^^^^^^^^^^ expected struct `typenum::B0`, found struct `typenum::B1`
   |
   = note: expected type `typenum::B0`
       found type `typenum::B1`
```
The `expected typenum::B0 found typenum::B1` part kind of makes sense, but what on earth is the `typenum::UInt&lt;typenum::UInt, typenum::UInt…` nonsense? Well, `typenum` represents numbers as binary [cons][13] cells! Errors like this make it hard, especially when you have several of these type-level numbers confined to tight quarters, to know which number it's talking about. Unless, of course, it's second nature for you to translate baroque binary representations to decimal ones.
After the `U100`th time attempting to decipher any meaning from this mess, a teammate got Mad As Hell And Wasn't Going To Take It Anymore and made a little utility, `tnfilt`, to parse the meaning out from the misery that is namespaced binary cons cells. `tnfilt` takes the cons cell-style notation and replaces it with sensible decimal numbers. We imagine that others will face similar difficulties, so we shared [`tnfilt`][14]. You can use it like this:
```
`$ cargo build 2>&1 | tnfilt`
```
It transforms the output above into something like this:
```
`error[E0271]: type mismatch resolving `<U20 as typenum::IsLessOrEqual<U10>>::Output == typenum::B1``
```
Now _that_ makes sense!
### In conclusion
Memory-mapped registers are used ubiquitously when interacting with hardware from software, and there are myriad ways to portray those interactions, each of which has a different place on the spectra of ease-of-use and safety. We found that the use of type-level programming to get compile-time checking on memory-mapped register interactions gave us the necessary information to make safer software. That code is available in the `[bounded-registers][15] crate` (Rust package).
Our team started out right at the edge of the more-safe side of that safety spectrum and then tried to figure out how to move the ease-of-use slider closer to the easy end. From those ambitions, `bounded-registers` was born, and we use it anytime we encounter memory-mapped devices in our adventures at Auxon.
* * *
1. Technically, a read from a register field, by definition, will only give a value within the prescribed bounds, but none of us lives in a pure world, and you never know what's going to happen when external systems come into play. You're at the behest of the Hardware Gods here, so instead of forcing you into a "might panic" situation, it gives you the `Option` to handle a "This Should Never Happen" case.
2. `get_field` looks a little weird. I'm looking at the `Field::Read` part, specifically. `Field` is a type, and you need an instance of that type to pass to `get_field`. A cleaner API might be something like:
```
`regs.rx.get_field::<UartRx::Data::Field>();`
```
But remember that `Field` is a type synonym that has fixed indices for width, offset, etc. To be able to parameterize `get_field` like this, you'd need higher-kinded types.
* * *
_This originally appeared on the [Auxon Engineering blog][16] and is edited and republished with permission._
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/1/c-vs-rust-abstractions
作者:[Dan Pittman][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/dan-pittman
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/tools_hardware_purple.png?itok=3NdVoYhl (Tools illustration)
[2]: https://cs-fundamentals.com/c-programming/history-of-c-programming-language.php
[3]: https://www.w3schools.in/c-tutorial/history-of-c/
[4]: https://www.rust-lang.org/
[5]: https://en.wikipedia.org/wiki/Rust_(programming_language)
[6]: https://docs.rs/crate/typenum
[7]: https://docs.rs/tock-registers/0.3.0/tock_registers/
[8]: https://en.wikipedia.org/wiki/Enumerated_type
[9]: https://github.com/auxoncorp/bounded-registers#the-register-api
[10]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter
[11]: tmp.shpxgDsodx#1
[12]: tmp.shpxgDsodx#2
[13]: https://en.wikipedia.org/wiki/Cons
[14]: https://github.com/auxoncorp/tnfilt
[15]: https://crates.io/crates/bounded-registers
[16]: https://blog.auxon.io/2019/10/25/type-level-registers/