[#]: collector: (lujun9972) [#]: translator: (wxy) [#]: reviewer: ( ) [#]: publisher: ( ) [#]: url: ( ) [#]: subject: (C vs. Rust: Which to choose for programming hardware abstractions) [#]: via: (https://opensource.com/article/20/1/c-vs-rust-abstractions) [#]: author: (Dan Pittman https://opensource.com/users/dan-pittman) C vs. Rust: Which to choose for programming hardware abstractions ====== Using type-level programming in Rust can make hardware abstractions safer. ![Tools illustration][1] Rust is an increasingly popular programming language positioned to be the best choice for hardware interfaces. It's often compared to C for its level of abstraction. This article explains how Rust can handle bitwise operations in a number of ways and offers a solution that provides both safety and ease of use. Language | Origin | Official description | Overview ---|---|---|--- C | 1972 | C is a general-purpose programming language which features economy of expression, modern control flow and data structures, and a rich set of operators. (Source: [CS Fundamentals][2]) | C is [an] imperative language and designed to compile in a relatively straightforward manner which provides low-level access to the memory. (Source: [W3schools.in][3]) Rust | 2010 | A language empowering everyone to build reliable and efficient software (Source: [Rust website][4]) | Rust is a multi-paradigm system programming language focused on safety, especially safe concurrency. (Source: [Wikipedia][5]) ### Bitwise operation over register values in C In the world of systems programming, where you may find yourself writing hardware drivers or interacting directly with memory-mapped devices, interaction is almost always done through memory-mapped registers provided by the hardware. You typically interact with these things through bitwise operations on some fixed-width numeric type. For instance, imagine an 8-bit register with three fields: ``` +----------+------+-----------+---------+ | (unused) | Kind | Interrupt | Enabled | +----------+------+-----------+---------+    5-7       2-4        1          0 ``` The number below the field name prescribes the bits used by that field in the register. To enable this register, you would write the value **1**, represented in binary as **0000_0001**, to set the enabled field's bit. Often, though, you also have an existing configuration in the register that you don't want to disturb. Say you want to enable interrupts on the device but also want to be sure the device remains enabled. To do that, you must combine the Interrupt field's value with the Enabled field's value. You would do that with bitwise operations: ``` `1 | (1 << 1)` ``` This gives you the binary value **0000_0011** by **or**-ing 1 with 2, which you get by shifting 1 left by 1. You can write this to your register, leaving it enabled but also enabling interrupts. This is a lot to keep in your head, especially when you're dealing with potentially hundreds of registers for a complete system. In practice, you do this with mnemonics which track a field's position in a register and how wide the field is—i.e., _what's its upper bound?_ Here's an example of one of these mnemonics. They are C macros that replace their occurrences with the code on the right-hand side. This is the shorthand for the register laid out above. The left-hand side of the **&** puts you in position for that field, and the right-hand side limits you to only that field's bits: ``` #define REG_ENABLED_FIELD(x) (x << 0) & 1 #define REG_INTERRUPT_FIELD(x) (x << 1) & 2 #define REG_KIND_FIELD(x) (x << 2) & (7 << 2) ``` You'd then use these to abstract over the derivation of a register's value with something like: ``` void set_reg_val(reg* u8, val u8); fn enable_reg_with_interrupt(reg* u8) {     set_reg_val(reg, REG_ENABLED_FIELD(1) | REG_INTERRUPT_FIELD(1)); } ``` This is the state of the art. In fact, this is how the bulk of drivers appear in the Linux kernel. Is there a better way? Consider the boon to safety and expressibility if the type system was borne out of research on modern programming languages. That is, what could you do with a richer, more expressive type system to make this process safer and more tenable? ### Bitwise operation over register values in Rust Continuing with the register above as an example: ``` +----------+------+-----------+---------+ | (unused) | Kind | Interrupt | Enabled | +----------+------+-----------+---------+    5-7       2-4        1          0 ``` How might you want to express such a thing in Rust types? You'll start in a similar way, by defining constants for each field's _offset_—that is, how far it is from the least significant bit—and its mask. A _mask_ is a value whose binary representation can be used to update or read the field from inside the register: ``` const ENABLED_MASK: u8 = 1; const ENABLED_OFFSET: u8 = 0; const INTERRUPT_MASK: u8 = 2; const INTERRUPT_OFFSET: u8 = 1; const KIND_MASK: u8 = 7 << 2; const KIND_OFFSET: u8 = 2; ``` Next, you'll declare a field type and do your operations to convert a given value into its position-relevant value for use inside the register: ``` struct Field {     value: u8, } impl Field {     fn new(mask: u8, offset: u8, val: u8) -> Self {         Field {             value: (val << offset) & mask,         }     } } ``` Finally, you'll use a **Register** type, which wraps around a numeric type that matches the width of your register. **Register** has an **update** function that updates the register with the given field: ``` struct Register(u8); impl Register {     fn update(&mut self, val: Field) {         self.0 = self.0 | field.value;     } } fn enable_register(&mut reg) {     reg.update(Field::new(ENABLED_MASK, ENABLED_OFFSET, 1)); } ``` With Rust, you can use data structures to represent fields, attach them to specific registers, and provide concise and sensible ergonomics while interacting with the hardware. This example uses the most basic facilities provided by Rust; regardless, the added structure alleviates some of the density from the C example above. Now a field is a named thing, not a number derived from shadowy bitwise operators, and registers are types with state—one extra layer of abstraction over the hardware. ### A Rust implementation for ease of use The first rewrite in Rust is nice, but it's not ideal. You have to remember to bring the mask and offset, and you're calculating them ad hoc, by hand, which is error-prone. Humans aren't great at precise and repetitive tasks—we tend to get tired or lose focus, and this leads to mistakes. Transcribing the masks and offsets by hand, one register at a time, will almost certainly end badly. This is the kind of task best left to a machine. Second, thinking more structurally: What if there were a way to have the field's type carry the mask and offset information? What if you could catch mistakes in your implementation for how you access and interact with hardware registers at compile time instead of discovering them at runtime? Perhaps you can lean on one of the strategies commonly used to suss out issues at compile time, like types. You can modify the earlier example by using [**typenum**][6], a library that provides numbers and arithmetic at the type level. Here, you'll parameterize the **Field** type with its mask and offset, making it available for any instance of **Field** without having to include it at the call site: ``` #[macro_use] extern crate typenum; use core::marker::PhantomData; use typenum::*; // Now we'll add Mask and Offset to Field's type struct Field<Mask: Unsigned, Offset: Unsigned> {     value: u8,     _mask: PhantomData<Mask>,     _offset: PhantomData<Offset>, } // We can use type aliases to give meaningful names to // our fields (and not have to remember their offsets and masks). type RegEnabled = Field<U1, U0>; type RegInterrupt = Field<U2, U1>; type RegKind = Field<op!(U7 << U2), U2>; ``` Now, when revisiting **Field**'s constructor, you can elide the mask and offset parameters because the type contains that information: ``` impl<Mask: Unsigned, Offset: Unsigned> Field<Mask, Offset> {     fn new(val: u8) -> Self {         Field {             value: (val << Offset::U8) & Mask::U8,             _mask: PhantomData,             _offset: PhantomData,         }     } } // And to enable our register... fn enable_register(&mut reg) {     reg.update(RegEnabled::new(1)); } ``` It looks pretty good, but… what happens when you make a mistake regarding whether a given value will _fit_ into a field? Consider a simple typo where you put **10** instead of **1**: ``` fn enable_register(&mut reg) {     reg.update(RegEnabled::new(10)); } ``` In the code above, what is the expected outcome? Well, the code will set that enabled bit to 0 because **10 & 1 = 0**. That's unfortunate; it would be nice to know whether a value you're trying to write into a field will fit into the field before attempting a write. As a matter of fact, I'd consider lopping off the high bits of an errant field value _undefined behavior_ (gasps). ### Using Rust with safety in mind How can you check that a field's value fits in its prescribed position in a general way? More type-level numbers! You can add a **Width** parameter to **Field** and use it to verify that a given value can fit into the field: ``` struct Field<Width: Unsigned, Mask: Unsigned, Offset: Unsigned> {     value: u8,     _mask: PhantomData<Mask>,     _offset: PhantomData<Offset>,     _width: PhantomData<Width>, } type RegEnabled = Field<U1,U1, U0>; type RegInterrupt = Field<U1, U2, U1>; type RegKind = Field<U3, op!(U7 << U2), U2>; impl<Width: Unsigned, Mask: Unsigned, Offset: Unsigned> Field<Width, Mask, Offset> {     fn new(val: u8) -> Option<Self> {         if val <= (1 << Width::U8) - 1 {             Some(Field {                 value: (val << Offset::U8) & Mask::U8,                 _mask: PhantomData,                 _offset: PhantomData,                 _width: PhantomData,             })         } else {             None         }     } } ``` Now you can construct a **Field** only if the given value fits! Otherwise, you have **None**, which signals that an error has occurred, rather than lopping off the high bits of the value and silently writing an unexpected value. Note, though, this will raise an error at runtime. However, we knew the value we wanted to write beforehand, remember? Given that, we can teach the compiler to reject entirely a program which has an invalid field value—we don’t have to wait until we run it! This time, you'll add a _trait bound_ (the **where** clause) to a new realization of new, called **new_checked**, that asks the incoming value to be less than or equal to the maximum possible value a field with the given **Width** can hold: ``` struct Field<Width: Unsigned, Mask: Unsigned, Offset: Unsigned> {     value: u8,     _mask: PhantomData<Mask>,     _offset: PhantomData<Offset>,     _width: PhantomData<Width>, } type RegEnabled = Field<U1, U1, U0>; type RegInterrupt = Field<U1, U2, U1>; type RegKind = Field<U3, op!(U7 << U2), U2>; impl<Width: Unsigned, Mask: Unsigned, Offset: Unsigned> Field<Width, Mask, Offset> {     const fn new_checked<V: Unsigned>() -> Self     where         V: IsLessOrEqual<op!((U1 << Width) - U1), Output = True>,     {         Field {             value: (V::U8 << Offset::U8) & Mask::U8,             _mask: PhantomData,             _offset: PhantomData,             _width: PhantomData,         }     } } ``` Only numbers for which this property holds has an implementation of this trait, so if you use a number that does not fit, it will fail to compile. Take a look! ``` fn enable_register(&mut reg) {     reg.update(RegEnabled::new_checked::<U10>()); } 12 |     reg.update(RegEnabled::new_checked::<U10>());    |                           ^^^^^^^^^^^^^^^^ expected struct `typenum::B0`, found struct `typenum::B1`    |    = note: expected type `typenum::B0`            found type `typenum::B1` ``` **new_checked** will fail to produce a program that has an errant too-high value for a field. Your typo won't blow up at runtime because you could never have gotten an artifact to run. You're nearing Peak Rust in terms of how safe you can make memory-mapped hardware interactions. However, what you wrote back in the first example in C was far more succinct than the type parameter salad you ended up with. Is doing such a thing even tractable when you're talking about potentially hundreds or even thousands of registers? ### Just right with Rust: both safe and accessible Earlier, I called out calculating masks by hand as being problematic, but I just did that same problematic thing—albeit at the type level. While using such an approach is nice, getting to the point when you can write any code requires quite a bit of boilerplate and manual transcription (I'm talking about the type synonyms here). Our team wanted something like the [TockOS mmio registers][7], but one that would generate typesafe implementations with the least amount of manual transcription possible. The result we came up with is a macro that generates the necessary boilerplate to get a Tock-like API plus type-based bounds checking. To use it, write down some information about a register, its fields, their width and offsets, and optional [enum][8]-like values (should you want to give "meaning" to the possible values a field can have): ``` register! {     // The register's name     Status,     // The type which represents the whole register.     u8,     // The register's mode, ReadOnly, ReadWrite, or WriteOnly.     RW,     // And the fields in this register.     Fields [         On    WIDTH(U1) OFFSET(U0),         Dead  WIDTH(U1) OFFSET(U1),         Color WIDTH(U3) OFFSET(U2) [             Red    = U1,             Blue   = U2,             Green  = U3,             Yellow = U4         ]     ] } ``` From this, you can generate register and field types like the previous example where the indices—the **Width**, **Mask**, and **Offset**—are derived from the values input in the **WIDTH** and **OFFSET** sections of a field's definition. Also, notice that all of these numbers are **typenums**; they're going to go directly into your **Field** definitions! The generated code provides namespaces for registers and their associated fields through the name given for the register and the fields. That's a mouthful; here's what it looks like: ``` mod Status {     struct Register(u8);     mod On {         struct Field; // There is of course more to this definition     }     mod Dead {         struct Field;     }     mod Color {         struct Field;         pub const Red: Field = Field::<U1>new();         // &c.     } } ``` The generated API contains the nominally expected read and write primitives to get at the raw register value, but it also has ways to get a single field's value, do collective actions, and find out if any (or all) of a collection of bits is set. You can read the documentation on the [complete generated API][9]. ### Kicking the tires What does it look like to use these definitions for a real device? Will the code be littered with type parameters, obscuring any real logic from view? No! By using type synonyms and type inference, you effectively never have to think about the type-level part of the program at all. You get to interact with the hardware in a straightforward way and get those bounds-related assurances automatically. Here's an example of a [UART][10] register block. I'll skip the declaration of the registers themselves, as that would be too much to include here. Instead, it starts with a register "block" then helps the compiler know how to look up the registers from a pointer to the head of the block. We do that by implementing **Deref** and **DerefMut**: ``` #[repr(C)] pub struct UartBlock {     rx: UartRX::Register,     _padding1: [u32; 15],     tx: UartTX::Register,     _padding2: [u32; 15],     control1: UartControl1::Register, } pub struct Regs {     addr: usize, } impl Deref for Regs {     type Target = UartBlock;     fn deref(&self) -> &UartBlock {         unsafe { &*(self.addr as *const UartBlock) }     } } impl DerefMut for Regs {     fn deref_mut(&mut self) -> &mut UartBlock {         unsafe { &mut *(self.addr as *mut UartBlock) }     } } ``` Once this is in place, using these registers is as simple as **read()** and **modify()**: ``` fn main() {     // A pretend register block.     let mut x = [0_u32; 33];     let mut regs = Regs {         // Some shenanigans to get at `x` as though it were a         // pointer. Normally you'd be given some address like         // `0xDEADBEEF` over which you'd instantiate a `Regs`.         addr: &mut x as *mut [u32; 33] as usize,     };     assert_eq!(regs.rx.read(), 0);     regs.control1         .modify(UartControl1::Enable::Set + UartControl1::RecvReadyInterrupt::Set);     // The first bit and the 10th bit should be set.     assert_eq!(regs.control1.read(), 0b_10_0000_0001); } ``` When we're working with runtime values we use **Option** like we saw earlier. Here I'm using **unwrap**, but in a real program with unknown inputs, you'd probably want to check that you got a **Some** back from that new call:[1][11],[2][12] ``` fn main() {     // A pretend register block.     let mut x = [0_u32; 33];     let mut regs = Regs {         // Some shenanigans to get at `x` as though it were a         // pointer. Normally you'd be given some address like         // `0xDEADBEEF` over which you'd instantiate a `Regs`.         addr: &mut x as *mut [u32; 33] as usize,     };     let input = regs.rx.get_field(UartRX::Data::Field::Read).unwrap();     regs.tx.modify(UartTX::Data::Field::new(input).unwrap()); } ``` ### Decoding failure conditions Depending on your personal pain threshold, you may have noticed that the errors are nearly unintelligible. Take a look at a not-so-subtle reminder of what I'm talking about: ``` error[E0271]: type mismatch resolving `<typenum::UInt<typenum::UInt<typenum::UInt<typenum::UInt<typenum::UInt<typenum::UTerm, typenum::B1>, typenum::B0>, typenum::B1>, typenum::B0>, typenum::B0> as typenum::IsLessOrEqual<typenum::UInt<typenum::UInt<typenum::UInt<typenum::UInt<typenum::UTerm, typenum::B1>, typenum::B0>, typenum::B1>, typenum::B0>>>::Output == typenum::B1`   --> src/main.rs:12:5    | 12 |     less_than_ten::<U20>();    |     ^^^^^^^^^^^^^^^^^^^^ expected struct `typenum::B0`, found struct `typenum::B1`    |    = note: expected type `typenum::B0`        found type `typenum::B1` ``` The **expected typenum::B0 found typenum::B1** part kind of makes sense, but what on earth is the **typenum::UInt<typenum::UInt, typenum::UInt…** nonsense? Well, **typenum** represents numbers as binary [cons][13] cells! Errors like this make it hard, especially when you have several of these type-level numbers confined to tight quarters, to know which number it's talking about. Unless, of course, it's second nature for you to translate baroque binary representations to decimal ones. After the **U100**th time attempting to decipher any meaning from this mess, a teammate got Mad As Hell And Wasn't Going To Take It Anymore and made a little utility, **tnfilt**, to parse the meaning out from the misery that is namespaced binary cons cells. **tnfilt** takes the cons cell-style notation and replaces it with sensible decimal numbers. We imagine that others will face similar difficulties, so we shared [**tnfilt**][14]. You can use it like this: ``` `$ cargo build 2>&1 | tnfilt` ``` It transforms the output above into something like this: ``` `error[E0271]: type mismatch resolving `>::Output == typenum::B1`` ``` Now _that_ makes sense! ### In conclusion Memory-mapped registers are used ubiquitously when interacting with hardware from software, and there are myriad ways to portray those interactions, each of which has a different place on the spectra of ease-of-use and safety. We found that the use of type-level programming to get compile-time checking on memory-mapped register interactions gave us the necessary information to make safer software. That code is available in the **[bounded-registers][15] crate** (Rust package). Our team started out right at the edge of the more-safe side of that safety spectrum and then tried to figure out how to move the ease-of-use slider closer to the easy end. From those ambitions, **bounded-registers** was born, and we use it anytime we encounter memory-mapped devices in our adventures at Auxon. * * * 1. Technically, a read from a register field, by definition, will only give a value within the prescribed bounds, but none of us lives in a pure world, and you never know what's going to happen when external systems come into play. You're at the behest of the Hardware Gods here, so instead of forcing you into a "might panic" situation, it gives you the **Option** to handle a "This Should Never Happen" case. 2. **get_field** looks a little weird. I'm looking at the **Field::Read** part, specifically. **Field** is a type, and you need an instance of that type to pass to **get_field**. A cleaner API might be something like: ``` `regs.rx.get_field::();` ``` But remember that **Field** is a type synonym that has fixed indices for width, offset, etc. To be able to parameterize **get_field** like this, you'd need higher-kinded types. * * * _This originally appeared on the [Auxon Engineering blog][16] and is edited and republished with permission._ -------------------------------------------------------------------------------- via: https://opensource.com/article/20/1/c-vs-rust-abstractions 作者:[Dan Pittman][a] 选题:[lujun9972][b] 译者:[译者ID](https://github.com/译者ID) 校对:[校对者ID](https://github.com/校对者ID) 本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 [a]: https://opensource.com/users/dan-pittman [b]: https://github.com/lujun9972 [1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/tools_hardware_purple.png?itok=3NdVoYhl (Tools illustration) [2]: https://cs-fundamentals.com/c-programming/history-of-c-programming-language.php [3]: https://www.w3schools.in/c-tutorial/history-of-c/ [4]: https://www.rust-lang.org/ [5]: https://en.wikipedia.org/wiki/Rust_(programming_language) [6]: https://docs.rs/crate/typenum [7]: https://docs.rs/tock-registers/0.3.0/tock_registers/ [8]: https://en.wikipedia.org/wiki/Enumerated_type [9]: https://github.com/auxoncorp/bounded-registers#the-register-api [10]: https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter [11]: tmp.shpxgDsodx#1 [12]: tmp.shpxgDsodx#2 [13]: https://en.wikipedia.org/wiki/Cons [14]: https://github.com/auxoncorp/tnfilt [15]: https://crates.io/crates/bounded-registers [16]: https://blog.auxon.io/2019/10/25/type-level-registers/