translated

- 第一段结尾的 lingua franca 我没有翻译因为我认为这里不应该翻译
- 文中有一处提到 epoch 的地方我一半觉得不用翻译,一半不会翻译
- 链接中的 [1] 我把 lingua franca 的 wiki 链接换成了百度百科链接
- epoch 处我添加了 [4] UNIX时间 的百度百科链接
- 我在 ### 调用涉及指针的 C 函数 结尾处加入了较长一段注释,主要解释作者代码在本地运行时可能会遇到的问题,我在其中加入了两个链接 [5] 和 [6]
- [1] 的原链接是 https://en.wikipedia.org/wiki/Lingua_franca
- [4] epoch 的 wiki 链接是 https://en.wikipedia.org/wiki/Epoch_(computing)
- 我感觉 wiki 写的比百度百科要好,但是写得好的 wiki 内容是英文的,访问也是个问题
This commit is contained in:
yzuowei 2022-12-12 14:14:44 +08:00 committed by GitHub
parent 8c02b4d7a6
commit 162e301137
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 287 additions and 283 deletions

View File

@ -1,283 +0,0 @@
[#]: subject: "Introducing Rust calls to C library functions"
[#]: via: "https://opensource.com/article/22/11/rust-calls-c-library-functions"
[#]: author: "Marty Kalin https://opensource.com/users/mkalindepauledu"
[#]: collector: "lkxed"
[#]: translator: "yzuowei"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
Introducing Rust calls to C library functions
======
Why call C functions from Rust? The short answer is software libraries. A longer answer touches on where C stands among programming languages in general and towards Rust in particular. C, C++, and Rust are systems languages, which give programmers access to machine-level data types and operations. Among these three systems languages, C remains the dominant one. The kernels of modern operating systems are written mainly in C, with assembly language accounting for the rest. The standard system libraries for input and output, number crunching, cryptography, security, networking, internationalization, string processing, memory management, and more, are likewise written mostly in C. These libraries represent a vast infrastructure for applications written in any other language. Rust is well along the way to providing fine libraries of its own, but C libraries—around since the 1970s and still growing—are a resource not to be ignored. Finally, C is still the [lingua franca][1] among programming languages: most languages can talk to C and, through C, to any other language that does so.
### Two proof-of-concept examples
Rust has an FFI (Foreign Function Interface) that supports calls to C functions. An issue for any FFI is whether the calling language covers the data types in the called language. For example, `ctypes` is an FFI for calls from Python into C, but Python doesn't cover the unsigned integer types available in C. As a result, `ctypes` must resort to workarounds.
By contrast, Rust covers all the primitive (that is, machine-level) types in C. For example, the Rust `i32` type matches the C `int` type. C specifies only that the `char` type must be one byte in size and other types, such as `int`, must be at least this size; but nowadays every reasonable C compiler supports a four-byte `int`, an eight-byte `double` (in Rust, the `f64` type), and so on.
There is another challenge for an FFI directed at C: Can the FFI handle C's raw pointers, including pointers to arrays that count as strings in C? C does not have a string type, but rather implements strings as character arrays with a non-printing terminating character, the _null terminator_ of legend. By contrast, Rust has two string types: `String` and `&str` (string slice). The question, then, is whether the Rust FFI can transform a C string into a Rust one—and the answer is _yes_.
Pointers to structures also are common in C. The reason is efficiency. By default, a C structure is passed _by_value (that is, by a byte-per-byte copy) when a structure is either an argument passed to a function or a value returned from one. C structures, like their Rust counterparts, can include arrays and nest other structures and so be arbitrarily large in size. Best practice in either language is to pass and return structures by reference, that is, by passing or returning the structure's address rather than a copy of the structure. Once again, the Rust FFI is up to the task of handling C pointers to structures, which are common in C libraries.
The first code example focuses on calls to relatively simple C library functions such as `abs` (absolute value) and `sqrt` (square root). These functions take non-pointer scalar arguments and return a non-pointer scalar value. The second code example, which covers strings and pointers to structures, introduces the [bindgen][2] utility, which generates Rust code from C interface (header) files such as `math.h` and `time.h`. C header files specify the calling syntax for C functions and define structures used in such calls. The two code examples are [available on my homepage][3].
### Calling relatively simple C functions
The first code example has four Rust calls to C functions in the standard mathematics library: one call apiece to `abs` (absolute value) and `pow` (exponentiation), and two calls to `sqrt` (square root). The program can be built directly with the `rustc` compiler, or more conveniently with the `cargo build` command:
```
use std::os::raw::c_int; // 32 bits
use std::os::raw::c_double; // 64 bits
// Import three functions from the standard library libc.
// Here are the Rust declarations for the C functions:
extern "C" {
fn abs(num: c_int) -> c_int;
fn sqrt(num: c_double) -> c_double;
fn pow(num: c_double, power: c_double) -> c_double;
}
fn main() {
let x: i32 = -123;
println!("\nAbsolute value of {x}: {}.", unsafe { abs(x) });
let n: f64 = 9.0;
let p: f64 = 3.0;
println!("\n{n} raised to {p}: {}.", unsafe { pow(n, p) });
let mut y: f64 = 64.0;
println!("\nSquare root of {y}: {}.", unsafe { sqrt(y) });
y = -3.14;
println!("\nSquare root of {y}: {}.", unsafe { sqrt(y) }); //** NaN = NotaNumber
}
```
The two `use` declarations at the top are for the Rust data types `c_int` and `c_double`, which match the C types `int` and `double`, respectively. The standard Rust module `std::os::raw` defines fourteen such types for C compatibility. The module `std::ffi` has the same fourteen type definitions together with support for strings.
The `extern "C"` block above the `main` function then declares the three C library functions called in the `main` function below. Each call uses the standard C function's name, but each call must occur within an `unsafe` block. As every programmer new to Rust discovers, the Rust compiler enforces memory safety with a vengeance. Other languages (in particular, C and C++) do not make the same guarantees. The `unsafe` block thus says: Rust takes no responsibility for whatever unsafe operations may occur in the external call.
The first program's output is:
```
Absolute value of -123: 123.
9 raised to 3: 729
Square root of 64: 8.
Square root of -3.14: NaN.
```
In the last output line, the `NaN` stands for Not a Number: the C `sqrt` library function expects a non-negative value as its argument, which means that the argument -3.14 generates `NaN` as the returned value.
### Calling C functions involving pointers
C library functions in security, networking, string processing, memory management, and other areas regularly use pointers for efficiency. For example, the library function `asctime` (time as an ASCII string) expects a pointer to a structure as its single argument. A Rust call to a C function such as `asctime` is thus trickier than a call to `sqrt`, which involves neither pointers nor structures.
The C structure for the `asctime` function call is of type `struct tm`. A pointer to such a structure also is passed to library function `mktime` (make a time value). The structure breaks a time into units such as the year, the month, the hour, and so forth. The structure's fields are of type `time_t`, an alias for for either `int` (32 bits) or `long` (64 bits). The two library functions combine these broken-apart time pieces into a single value: `asctime` returns a string representation of the time, whereas `mktime` returns a `time_t` value that represents the number of elapsed seconds since the _epoch_, which is a time relative to which a system's clock and timestamp are determined. Typical epoch settings are January 1 00:00:00 (zero hours, minutes, and seconds) of either 1900 or 1970.
The C program below calls `asctime` and `mktime`, and uses another library function `strftime` to convert the `mktime` returned value into a formatted string. This program acts as a warm-up for the Rust version:
```
#include <stdio.h>
#include <time.h>
int main () {
struct tm sometime; /* time broken out in detail */
char buffer[80];
int utc;
sometime.tm_sec = 1;
sometime.tm_min = 1;
sometime.tm_hour = 1;
sometime.tm_mday = 1;
sometime.tm_mon = 1;
sometime.tm_year = 1;
sometime.tm_hour = 1;
sometime.tm_wday = 1;
sometime.tm_yday = 1;
printf("Date and time: %s\n", asctime(&sometime));
utc = mktime(&sometime);
if( utc < 0 ) {
fprintf(stderr, "Error: unable to make time using mktime\n");
} else {
printf("The integer value returned: %d\n", utc);
strftime(buffer, sizeof(buffer), "%c", &sometime);
printf("A more readable version: %s\n", buffer);
}
return 0;
}
```
The program outputs:
```
Date and time: Fri Feb  1 01:01:01 1901
The integer value returned: 2120218157
A more readable version: Fri Feb  1 01:01:01 1901
```
In summary, the Rust calls to library functions `asctime` and `mktime` must deal with two issues:
- Passing a raw pointer as the single argument to each library function.
- Converting the C string returned from `asctime` into a Rust string.
### Rust calls to `asctime` and `mktime`
The `bindgen` utility generates Rust support code from C header files such as `math.h` and `time.h`. In this example, a simplified version of `time.h` will do but with two changes from the original:
- The built-in type `int` is used instead of the alias type `time_t`. The bindgen utility can handle the `time_t` type but generates some distracting warnings along the way because `time_t` does not follow Rust naming conventions: in `time_t` an underscore separates the `t` at the end from the `time` that comes first; Rust would prefer a CamelCase name such as `TimeT`.
- The type `struct tm` type is given `StructTM` as an alias for the same reason.
Here is the simplified header file with declarations for `mktime` and `asctime` at the bottom:
```
typedef struct tm {
int tm_sec; /* seconds */
int tm_min; /* minutes */
int tm_hour; /* hours */
int tm_mday; /* day of the month */
int tm_mon; /* month */
int tm_year; /* year */
int tm_wday; /* day of the week */
int tm_yday; /* day in the year */
int tm_isdst; /* daylight saving time */
} StructTM;
extern int mktime(StructTM*);
extern char* asctime(StructTM*);
```
With `bindgen` installed, `%` as the command-line prompt, and `mytime.h` as the header file above, the following command generates the required Rust code and saves it in the file `mytime.rs`:
```
% bindgen mytime.h > mytime.rs
```
Here is the relevant part of `mytime.rs`:
```
/* automatically generated by rust-bindgen 0.61.0 */
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct tm {
pub tm_sec: ::std::os::raw::c_int,
pub tm_min: ::std::os::raw::c_int,
pub tm_hour: ::std::os::raw::c_int,
pub tm_mday: ::std::os::raw::c_int,
pub tm_mon: ::std::os::raw::c_int,
pub tm_year: ::std::os::raw::c_int,
pub tm_wday: ::std::os::raw::c_int,
pub tm_yday: ::std::os::raw::c_int,
pub tm_isdst: ::std::os::raw::c_int,
}
pub type StructTM = tm;
extern "C" {
pub fn mktime(arg1: *mut StructTM) -> ::std::os::raw::c_int;
}
extern "C" {
pub fn asctime(arg1: *mut StructTM) -> *mut ::std::os::raw::c_char;
}
#[test]
fn bindgen_test_layout_tm() {
const UNINIT: ::std::mem::MaybeUninit<tm> = ::std::mem::MaybeUninit::uninit();
let ptr = UNINIT.as_ptr();
assert_eq!(
::std::mem::size_of::<tm>(),
36usize,
concat!("Size of: ", stringify!(tm))
);
...
```
The Rust structure `struct tm`, like the C original, contains nine 4-byte integer fields. The field names are the same in C and Rust. The `extern "C"` blocks declare the library functions `asctime` and `mktime` as taking one argument apiece, a raw pointer to a mutable `StructTM` instance. (The library functions may mutate the structure via the pointer passed as an argument.)
The remaining code, under the `#[test]` attribute, tests the layout of the Rust version of the time structure. The test can be run with the `cargo test` command. At issue is that C does not specify how the compiler must lay out the fields of a structure. For example, the C `struct tm` starts out with the field `tm_sec` for the second; but C does not require that the compiled version has this field as the first. In any case, the Rust tests should succeed and the Rust calls to the library functions should work as expected.
### Getting the second example up and running
The code generated from `bindgen` does not include a `main` function and, therefore, is a natural module. Below is the `main` function with the `StructTM` initialization and the calls to `asctime` and `mktime`:
```
mod mytime;
use mytime::*;
use std::ffi::CStr;
fn main() {
let mut sometime = StructTM {
tm_year: 1,
tm_mon: 1,
tm_mday: 1,
tm_hour: 1,
tm_min: 1,
tm_sec: 1,
tm_isdst: -1,
tm_wday: 1,
tm_yday: 1
};
unsafe {
let c_ptr = &mut sometime; // raw pointer
// make the call, convert and then own
// the returned C string
let char_ptr = asctime(c_ptr);
let c_str = CStr::from_ptr(char_ptr);
println!("{:#?}", c_str.to_str());
let utc = mktime(c_ptr);
println!("{}", utc);
}
}
```
The Rust code can be compiled (using either `rustc` directly or `cargo`) and then run. The output is:
```
Ok(
    "Mon Feb  1 01:01:01 1901\n",
)
2120218157
```
The calls to the C functions `asctime` and `mktime` again must occur inside an `unsafe` block, as the Rust compiler cannot be held responsible for any memory-safety mischief in these external functions. For the record, `asctime` and `mktime` are well behaved. In the calls to both functions, the argument is the raw pointer `ptr`, which holds the (stack) address of the `sometime` structure.
The call to `asctime` is the trickier of the two calls because this function returns a pointer to a C `char`, the character `M` in `Mon` of the text output. Yet the Rust compiler does not know where the C string (the null-terminated array of `char`) is stored. In the static area of memory? On the heap? The array used by the `asctime` function to store the text representation of the time is, in fact, in the static area of memory. In any case, the C-to-Rust string conversion is done in two steps to avoid compile-time errors:
- The call `Cstr::from_ptr(char_ptr)` converts the C string to a Rust string and returns a reference stored in the `c_str` variable.
- The call to `c_str.to_str()` ensures that `c_str` is the owner.
The Rust code does not generate a human-readable version of the integer value returned from `mktime`, which is left as an exercise for the interested. The Rust module `chrono::format` includes a `strftime` function, which can be used like the C function of the same name to get a text representation of the time.
### Calling C with FFI and bindgen
The Rust FFI and the `bindgen` utility are well designed for making Rust calls out to C libraries, whether standard or third-party. Rust talks readily to C and thereby to any other language that talks to C. For calling relatively simple library functions such as `sqrt`, the Rust FFI is straightforward because Rust's primitive data types cover their C counterparts.
For more complicated interchanges—in particular, Rust calls to C library functions such as `asctime` and `mktime` that involve structures and pointers—the `bindgen` utility is the way to go. This utility generates the support code together with appropriate tests. Of course, the Rust compiler cannot assume that C code measures up to Rust standards when it comes to memory safety; hence, calls from Rust to C must occur in `unsafe` blocks.
--------------------------------------------------------------------------------
via: https://opensource.com/article/22/11/rust-calls-c-library-functions
作者:[Marty Kalin][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/mkalindepauledu
[b]: https://github.com/lkxed
[1]: https://en.wikipedia.org/wiki/Lingua_franca
[2]: https://github.com/rust-lang/rust-bindgen
[3]: https://condor.depaul.edu/mkalin

View File

@ -0,0 +1,287 @@
[#]: subject: "Introducing Rust calls to C library functions"
[#]: via: "https://opensource.com/article/22/11/rust-calls-c-library-functions"
[#]: author: "Marty Kalin https://opensource.com/users/mkalindepauledu"
[#]: collector: "lkxed"
[#]: translator: "yzuowei"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
介绍从 Rust 调用 C 库函数
======
为什么要从 Rust 调用 C 函数?简短的回答就是软件库。冗长的答案则触及到 C 在众多编程语言中的地位,特别是相对 Rust 而言。CC++,还有 Rust 都是系统语言这意味着程序员会访问机器层面的数据类型与操作。在这三个系统语言中C 依然占据主导地位。现代操作系统的内核大致用 C 来写,其余部分依靠汇编语言来补充。在标准系统函数库中,输入与输出、数字处理、加密计算、安全、网络、国际化、字符串处理、内存管理,还用更多,都大体用 C 来写。这些函数库所代表的是一个庞大的基础架构支撑着用其他语言写出来的应用。Rust 发展至今也有着可观的函数库,但是 C 的函数库——自1970年代就已存在迄今还在蓬勃发展——是一个无法被忽视的资源。最后一点是 C 依然还是编程语言中的 [lingua franca][1]:大部分语言都与 C 交流,透过 C语言互相交流。
### 两个概念证明的例子
Rust 支持 FFI (外部函数接口)用以调用 C 函数。任何 FFI 所需要面临的问题是调用方语言是否包括了被调用语言的数据类型。例如,`ctypes` 是 Python 调用 C 的 FFI但是 Python 并没有包括 C 所支持的无符号整数类型。结果就是,`ctypes` 必须寻求解决方案。
与之相对的是Rust 包含了所有 C 中的原始机器层面类型。比如说Rust 中的 `i32` 类对应 C 中的 `int` 类。C 特别声明了 `char` 类必须是一个字节大小,而其他类型,比如 `int`必须至少是这个大小LCTT 译注:原文处有评论指出 `int` 大小依照 C 标准应至少为2字节然而如今所有合理的 C 编译器都支持四字节的 `int`,以及八字节的 `double`Rust 中则是 `f64` 类),以此类推。
面向 C 的 FFI 所面临的另一个挑战是FFI 是否能够处理 C 的裸指针包括指向被看作是字符串的数组指针。C 没有字符串类型它通过结合字符组和一个不会被打印的终止符来实现字符串大名鼎鼎的_空终止符_。与之相对Rust 有两个字符串类型:`String` 和 `&str` 字符串切片。问题是Rust FFI 是否能将 C 字符串转化成 Rust 字符串——答案是_肯定的_。
出于对效率的追求,结构体指针在 C 中也很常见。一个 C 结构体在作为一个函数的参数或者返回值的时候其默认行为是传递值一个字节一个字节的复制。C 结构体,如同它在 Rust 中的对应部分一样可以包含数组和嵌套其他结构体所以其大小是不定的。结构体在两种语言中的最佳用法是传递或返回引用也就是说传递或返回结构体的地址而不是结构体本身的复制。Rust 再一次成功处理了 C 的结构体指针,其在 C 函数库中十分普遍。
第一段代码案例专注于调用相对简单的 C 库函数,比如 `abs`(绝对值)和 `sqrt`(平方根)。这些函数使用非指针标量参数并返回一个非指针标量值。第二段代码案例则涉及了字符串和结构体指针,在这里会介绍工具 [bindgen][2],其通过 C 接口(头)文件生成 Rust 代码,比如 `math.h` 以及 `time.h`。C 头文件声明了 C 函数的调用语法并定义了会被调用的结构体。两段代码都能在[我的主页上][3]找到。
### 调用相对简单的 C 函数
第一段代码案例有四处 Rust 对标准数学库内的 C 函数的调用:两处分别调用了 `abs`(绝对值)和 `pow`(幂),两处重复调用了 `sqrt`(平方根)。这个程序可以直接用 `rustc` 编译器进行构建,或者使用更方便的命令 `cargo build`
```
use std::os::raw::c_int; // 32位
use std::os::raw::c_double; // 64位
// 从标准库 libc 中引入三个函数。
// 此处是 Rust 对三个 C 函数的声明:
extern "C" {
fn abs(num: c_int) -> c_int;
fn sqrt(num: c_double) -> c_double;
fn pow(num: c_double, power: c_double) -> c_double;
}
fn main() {
let x: i32 = -123;
println!("\n{x}的绝对值是: {}.", unsafe { abs(x) });
let n: f64 = 9.0;
let p: f64 = 3.0;
println!("\n{n}的{p}次方是: {}.", unsafe { pow(n, p) });
let mut y: f64 = 64.0;
println!("\n{y}的平方根是: {}.", unsafe { sqrt(y) });
y = -3.14;
println!("\n{y}的平方根是: {}.", unsafe { sqrt(y) }); //** NaN = NotaNumber不是数字
}
```
顶部的两个 `use` 声明是 Rust 的数据类型 `c_int``c_double`,对应 C 类型里的 `int``double`。Rust 标准模块 `std::os::raw` 定义了十四个类似的类型以确保跟 C 的兼容性。模块 `std::ffi` 中有十四个同样的类型定义以及对字符串的支持。
位于 `main` 函数上的 `extern "C"` 区域声明了三个 C 库函数,这些函数会在 `main` 函数内被调用。每次调用都使用了标准的 C 函数名,但每次调用都必须发生在一个 `unsafe` 区域内。正如每个新接触 Rust 的程序员所发现的那样Rust 编译器极度强制内存安全。其他语言(特别是 C 和 C++)作不出相同的保证。`unsafe` 区域其实是说Rust 对外部调用中可能存在的不安全行为不负责。
第一个程序输出为:
```
-123的绝对值是: 123.
9的3次方是: 729.
64的平方根是: 8.
-3.14的平方根是: NaN.
```
输出的最后一行的 `NaN` 表示不是数字 (Not a Number)C 库函数 `sqrt` 期待一个非负值作为参数,这使得参数-3.14生成了 `NaN` 作为返回值。
### 调用涉及指针的 C 函数
C 库函数为了提高效率经常在安全、网络、字符串处理、内存管理,以及其他领域中使用指针。例如,库函数 `asctime`(时间作为 ASCII 字符串期待一个结构体指针作为其参数。Rust 调用类似 `asctime` 的 C 函数就会比调用 `sqrt` 要更加棘手一些,后者既没有牵扯到指针,也不涉及到结构体。
函数 `asctime` 调用的 C 结构体类型为 `struct tm`。一个指向此结构体的指针会作为参数被传递给库函数 `mktime`(时间作为值)。此结构体会将时间拆分成诸如年、月、小时之类的单位。此结构体的字段 (fields) 类型为 `time_t`,是 `int`32位`long`64位的异名。两个库函数将这些破碎的时间碎片组合成了一个单一值`asctime` 返回一个字符串用以表达时间,而 `mktime` 返回一个 `time_t` 值表示自 [_epoch_][4],即系统时钟和时间戳被决定的那一刻,以来所经历的秒数。典型的 epoch 设置为1900年或1970年1月1日0时0分0秒。
以下的 C 程序调用了 `asctime``mktime`,并使用了其他库函数 `strftime` 来将 `mktime` 的返回值转化成一个格式化的字符串。这个程序可被视作 Rust 对应版本的预热:
```
#include <stdio.h>
#include <time.h>
int main () {
struct tm sometime; /* 时间被打破细分 */
char buffer[80];
int utc;
sometime.tm_sec = 1;
sometime.tm_min = 1;
sometime.tm_hour = 1;
sometime.tm_mday = 1;
sometime.tm_mon = 1;
sometime.tm_year = 1;
sometime.tm_hour = 1; /*LCTT 译注:这里作者多敲了一行*/
sometime.tm_wday = 1;
sometime.tm_yday = 1;
printf("日期与时间: %s\n", asctime(&sometime));
utc = mktime(&sometime);
if( utc < 0 ) {
fprintf(stderr, "错误: mktime 无法生成时间\n");
} else {
printf("返回的整数值: %d\n", utc);
strftime(buffer, sizeof(buffer), "%c", &sometime);
printf("更加可读的版本: %s\n", buffer);
}
return 0;
}
```
程序输出为:
```
日期与时间: Fri Feb  1 01:01:01 1901
返回的整数值: 2120218157
更加可读的版本: Fri Feb  1 01:01:01 1901
```
LCTT 译注:如果你尝试在自己电脑上运行这段代码,然后得到了一行关于 `mktime` 的错误信息,然后又在网上随便找了个在线 C 编译器,复制代码然后得到了跟这里的结果有区别但是没有错误的结果,不要慌,我的电脑上也是这样的。导致本地机器上 `mktime` 失败的原因是作者没有设置 `tm_isdst`,这个是用来标记夏令时的 flag。[`tm_isdst` 大于零则夏令时生效中,等于零则不生效,小于零标记未知][5]。加入 `sometime.tm_isdst = 0``= -1` 后应该就能得到跟在线编译器大致一样的结果。不同的地方在于结果第一行我得到的是 `Mon Feb ...`,这个与作者代码中 `sometime.tm_wday = 1` 对应,这里因该是作者写错了;第二行我和作者和网上得到的数字都不一样,这大概是合理的,因为这与机器的 epoch 有关第三行我跟作者的结果是一样的1901年2月1日也确实是周五这是因为 [`mktime` 其实会修正时间参数中不合理的地方][6]。至于夏令时具体是如何影响 `mktime` 这个问题,我能查到的只有 `mktime` 的计算受时区影响,更底层的原因我也不知道了。)
总的来说Rust 在调用库函数 `asctime``mktime` 时,必须处理以下两个问题:
- 将裸指针作为唯一参数传递给每个库函数。
- 把从 `asctime` 返回的 C 字符串转化为 Rust 字符串。
### Rust 调用 `asctime``mktime`
工具 `bindgen` 会根据类似 `math.h``time.h` 之类的 C 头文件生成 Rust 支持的代码。下面这个简化版的 `time.h` 就可以用来做例子,简化版与原版主要有两个不同:
- 内置类型 `int` 被用来取代异名类型 `time_t`。工具 bindgen 可以处理 `time_t` 类但是会生成一些烦人的警告,因为 `time_t` 不符合 Rust 的命名规范:`time_t` 以下划线区分 `time``t`Rust 更偏好驼峰式命名方法,比如 `TimeT`
- 出于同样的原因,这里选择 `StructTM` 作为 `struct tm` 的异名。
以下是一份简化版的头文件,`mktime` 和 `asctime` 在文件底部:
```
typedef struct tm {
int tm_sec; /* 秒 */
int tm_min; /* 分钟 */
int tm_hour; /* 小时 */
int tm_mday; /* 日 */
int tm_mon; /* 月 */
int tm_year; /* 年 */
int tm_wday; /* 星期 */
int tm_yday; /* 一年中的第几天 */
int tm_isdst; /* 夏令时 */
} StructTM;
extern int mktime(StructTM*);
extern char* asctime(StructTM*);
```
`bindgen` 安装好后,`%` 作为命令行提示,`mytime.h` 作为以上提到的头文件,以下命令可以生成所需的 Rust 代码并将其保存到文件 `mytime.rs`
```
% bindgen mytime.h > mytime.rs
```
以下是 `mytime.rs` 中的重要部分:
```
/* automatically generated by rust-bindgen 0.61.0 */
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct tm {
pub tm_sec: ::std::os::raw::c_int,
pub tm_min: ::std::os::raw::c_int,
pub tm_hour: ::std::os::raw::c_int,
pub tm_mday: ::std::os::raw::c_int,
pub tm_mon: ::std::os::raw::c_int,
pub tm_year: ::std::os::raw::c_int,
pub tm_wday: ::std::os::raw::c_int,
pub tm_yday: ::std::os::raw::c_int,
pub tm_isdst: ::std::os::raw::c_int,
}
pub type StructTM = tm;
extern "C" {
pub fn mktime(arg1: *mut StructTM) -> ::std::os::raw::c_int;
}
extern "C" {
pub fn asctime(arg1: *mut StructTM) -> *mut ::std::os::raw::c_char;
}
#[test]
fn bindgen_test_layout_tm() {
const UNINIT: ::std::mem::MaybeUninit<tm> = ::std::mem::MaybeUninit::uninit();
let ptr = UNINIT.as_ptr();
assert_eq!(
::std::mem::size_of::<tm>(),
36usize,
concat!("Size of: ", stringify!(tm))
);
...
```
Rust 结构体 `struct tm`,跟原本在 C 中的一样包含了九个4字节的整型字段。这些字段名称在 C 和 Rust 中是一样的。`extern "C"` 区域声明了库函数 `astime``mktime` 分别需要只一个参数,一个指向可变实例 `extern "C"` 的裸指针。(库函数可能会通过指针改变作为参数传递的结构体。)
`#[test]` 属性下的其余代码是用来测试 Rust 版的时间结构体的布局。通过命令 `cargo test` 可以进行这些测试。一个 C 不会声明的问题是编译器应该如何对结构体中的字段进行布局。比如说C 的 `struct tm` 以字段 `tm_sec` 开头用以表示秒;但是 C 不需要编译版本遵循这个排序。不管怎样Rust 测试应该会成功而 Rust 对库函数的调用也应如预期般工作。
### 设置好第二个案例并开始运行
`bindgen` 生成的代码不包含 `main` 函数,所以是一个天然的模块。以下是一个 `main` 函数初始化了 `StructTM` 并调用了 `asctime``mktime`
```
mod mytime;
use mytime::*;
use std::ffi::CStr;
fn main() {
let mut sometime = StructTM {
tm_year: 1,
tm_mon: 1,
tm_mday: 1,
tm_hour: 1,
tm_min: 1,
tm_sec: 1,
tm_isdst: -1,
tm_wday: 1,
tm_yday: 1
};
unsafe {
let c_ptr = &mut sometime; // 裸指针
// 调用,转化,并拥有
// 返回的 C 字符串
let char_ptr = asctime(c_ptr);
let c_str = CStr::from_ptr(char_ptr);
println!("{:#?}", c_str.to_str());
let utc = mktime(c_ptr);
println!("{}", utc);
}
}
```
这段 Rust 代码可以被编译(直接用 `rustc` 或使用 `cargo`)并运行。输出为:
```
Ok(
    "Mon Feb  1 01:01:01 1901\n",
)
2120218157
```
对 C 函数 `asctime``mktime` 的调用必须再一次被放在 `unsafe` 区域内,因为 Rust 编译器无法对这些外部函数的潜在内存安全风险负责。此处声明一下,`asctime` 和 `mktime` 并没有安全风险。调用的两个函数的参数是裸指针 `ptr`,其指向结构体 `sometime` (在栈 (stack) 中)的地址。
`asctime` 是两个函数中调用起来更棘手的那个,因为这个函数返回的是一个指向 C `char` 的指针,如果函数返回 `Mon` 那么指针就指向 `M`。但是 Rust 编译器并不知道 C 字符串 `char` 的空终止数组)的储存位置。是内存里的静态空间?还是堆 (heap)`asctime` 函数内用来储存时间的文字表达的数组实际上是在内存的静态空间里。无论如何C 到 Rust 字符串转化需要两个步骤来避免编译错误:
- 调用 `Cstr::from_ptr(char_ptr)` 来将 C 字符串转化为 Rust 字符串并返回一个引用储存在变量 `c_str` 中。
- 对 `c_str.to_str()` 的调用确保了 `c_str` 是所有者。
Rust 代码不会增加从 `mktime` 返回的整型值的易读性这一部分留作课外作业给感兴趣的人去探究。Rust 模板 `chrono::format` 也有一个 `strftime` 函数,它可以被当作 C 的同名函数来使用,两者都是获取时间的文字表达。
### 使用 FFI 和 bindgen 调用 C
Rust FFI 和工具 `bindgen` 都能够出色地协助 Rust 调用 C 库无论是标准库还是第三方库。Rust 轻松地与 C 交流,并透过 C 与其他语言交流。对于调用像 `sqrt` 一样简单的库函数Rust FFI 表现直截了当,这是因为 Rust 的原始数据类型覆盖了它们在 C 中的对应部分。
对于更为复杂的交流——特别是 Rust 调用像 `asctime``mktime` 一样,会涉及到结构体和指针的 C 库函数——工具 `bindgen` 是优秀的帮手。这个工具会生成支持代码以及所需要的测试。当然Rust 编译器无法假设 C 代码对内存安全的考虑会符合 Rust 的标准因此Rust 必须在 `unsafe` 区域内调用 C。
--------------------------------------------------------------------------------
via: https://opensource.com/article/22/11/rust-calls-c-library-functions
作者:[Marty Kalin][a]
选题:[lkxed][b]
译者:[yzuowei](https://github.com/yzuowei)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/mkalindepauledu
[b]: https://github.com/lkxed
[1]: https://baike.baidu.com/item/lingua%20franka/5359711
[2]: https://github.com/rust-lang/rust-bindgen
[3]: https://condor.depaul.edu/mkalin
[4]: https://baike.baidu.com/item/UNIX时间/8932323
[5]: https://cplusplus.com/reference/ctime/tm/
[6]: https://cplusplus.com/reference/ctime/mktime/