From 781d751e3718a4ffa4a1d3c341f8b520021959df Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 9 Jul 2016 21:15:36 +0800 Subject: [PATCH 01/17] Translated by cposture --- .../tech/20160512 Bitmap in Linux Kernel.md | 398 ----------------- .../tech/20160512 Bitmap in Linux Kernel.md | 405 ++++++++++++++++++ 2 files changed, 405 insertions(+), 398 deletions(-) delete mode 100644 sources/tech/20160512 Bitmap in Linux Kernel.md create mode 100644 translated/tech/20160512 Bitmap in Linux Kernel.md diff --git a/sources/tech/20160512 Bitmap in Linux Kernel.md b/sources/tech/20160512 Bitmap in Linux Kernel.md deleted file mode 100644 index adffc9d049..0000000000 --- a/sources/tech/20160512 Bitmap in Linux Kernel.md +++ /dev/null @@ -1,398 +0,0 @@ -[Translating by cposture 2016.06.29] -Data Structures in the Linux Kernel -================================================================================ - -Bit arrays and bit operations in the Linux kernel --------------------------------------------------------------------------------- - -Besides different [linked](https://en.wikipedia.org/wiki/Linked_data_structure) and [tree](https://en.wikipedia.org/wiki/Tree_%28data_structure%29) based data structures, the Linux kernel provides [API](https://en.wikipedia.org/wiki/Application_programming_interface) for [bit arrays](https://en.wikipedia.org/wiki/Bit_array) or `bitmap`. Bit arrays are heavily used in the Linux kernel and following source code files contain common `API` for work with such structures: - -* [lib/bitmap.c](https://github.com/torvalds/linux/blob/master/lib/bitmap.c) -* [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) - -Besides these two files, there is also architecture-specific header file which provides optimized bit operations for certain architecture. We consider [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture, so in our case it will be: - -* [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) - -header file. As I just wrote above, the `bitmap` is heavily used in the Linux kernel. For example a `bit array` is used to store set of online/offline processors for systems which support [hot-plug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) cpu (more about this you can read in the [cpumasks](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) part), a `bit array` stores set of allocated [irqs](https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) during initialization of the Linux kernel and etc. - -So, the main goal of this part is to see how `bit arrays` are implemented in the Linux kernel. Let's start. - -Declaration of bit array -================================================================================ - -Before we will look on `API` for bitmaps manipulation, we must know how to declare it in the Linux kernel. There are two common method to declare own bit array. The first simple way to declare a bit array is to array of `unsigned long`. For example: - -```C -unsigned long my_bitmap[8] -``` - -The second way is to use the `DECLARE_BITMAP` macro which is defined in the [include/linux/types.h](https://github.com/torvalds/linux/blob/master/include/linux/types.h) header file: - -```C -#define DECLARE_BITMAP(name,bits) \ - unsigned long name[BITS_TO_LONGS(bits)] -``` - -We can see that `DECLARE_BITMAP` macro takes two parameters: - -* `name` - name of bitmap; -* `bits` - amount of bits in bitmap; - -and just expands to the definition of `unsigned long` array with `BITS_TO_LONGS(bits)` elements, where the `BITS_TO_LONGS` macro converts a given number of bits to number of `longs` or in other words it calculates how many `8` byte elements in `bits`: - -```C -#define BITS_PER_BYTE 8 -#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) -#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) -``` - -So, for example `DECLARE_BITMAP(my_bitmap, 64)` will produce: - -```python ->>> (((64) + (64) - 1) / (64)) -1 -``` - -and: - -```C -unsigned long my_bitmap[1]; -``` - -After we are able to declare a bit array, we can start to use it. - -Architecture-specific bit operations -================================================================================ - -We already saw above a couple of source code and header files which provide [API](https://en.wikipedia.org/wiki/Application_programming_interface) for manipulation of bit arrays. The most important and widely used API of bit arrays is architecture-specific and located as we already know in the [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) header file. - -First of all let's look at the two most important functions: - -* `set_bit`; -* `clear_bit`. - -I think that there is no need to explain what these function do. This is already must be clear from their name. Let's look on their implementation. If you will look into the [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) header file, you will note that each of these functions represented by two variants: [atomic](https://en.wikipedia.org/wiki/Linearizability) and not. Before we will start to dive into implementations of these functions, first of all we must to know a little about `atomic` operations. - -In simple words atomic operations guarantees that two or more operations will not be performed on the same data concurrently. The `x86` architecture provides a set of atomic instructions, for example [xchg](http://x86.renejeschke.de/html/file_module_x86_id_328.html) instruction, [cmpxchg](http://x86.renejeschke.de/html/file_module_x86_id_41.html) instruction and etc. Besides atomic instructions, some of non-atomic instructions can be made atomic with the help of the [lock](http://x86.renejeschke.de/html/file_module_x86_id_159.html) instruction. It is enough to know about atomic operations for now, so we can begin to consider implementation of `set_bit` and `clear_bit` functions. - -First of all, let's start to consider `non-atomic` variants of this function. Names of non-atomic `set_bit` and `clear_bit` starts from double underscore. As we already know, all of these functions are defined in the [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) header file and the first function is `__set_bit`: - -```C -static inline void __set_bit(long nr, volatile unsigned long *addr) -{ - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); -} -``` - -As we can see it takes two arguments: - -* `nr` - number of bit in a bit array. -* `addr` - address of a bit array where we need to set bit. - -Note that the `addr` parameter is defined with `volatile` keyword which tells to compiler that value maybe changed by the given address. The implementation of the `__set_bit` is pretty easy. As we can see, it just contains one line of [inline assembler](https://en.wikipedia.org/wiki/Inline_assembler) code. In our case we are using the [bts](http://x86.renejeschke.de/html/file_module_x86_id_25.html) instruction which selects a bit which is specified with the first operand (`nr` in our case) from the bit array, stores the value of the selected bit in the [CF](https://en.wikipedia.org/wiki/FLAGS_register) flags register and set this bit. - -Note that we can see usage of the `nr`, but there is `addr` here. You already might guess that the secret is in `ADDR`. The `ADDR` is the macro which is defined in the same header code file and expands to the string which contains value of the given address and `+m` constraint: - -```C -#define ADDR BITOP_ADDR(addr) -#define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) -``` - -Besides the `+m`, we can see other constraints in the `__set_bit` function. Let's look on they and try to understand what do they mean: - -* `+m` - represents memory operand where `+` tells that the given operand will be input and output operand; -* `I` - represents integer constant; -* `r` - represents register operand - -Besides these constraint, we also can see - the `memory` keyword which tells compiler that this code will change value in memory. That's all. Now let's look at the same function but at `atomic` variant. It looks more complex that its `non-atomic` variant: - -```C -static __always_inline void -set_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "orb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)CONST_MASK(nr)) - : "memory"); - } else { - asm volatile(LOCK_PREFIX "bts %1,%0" - : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); - } -} -``` - -First of all note that this function takes the same set of parameters that `__set_bit`, but additionally marked with the `__always_inline` attribute. The `__always_inline` is macro which defined in the [include/linux/compiler-gcc.h](https://github.com/torvalds/linux/blob/master/include/linux/compiler-gcc.h) and just expands to the `always_inline` attribute: - -```C -#define __always_inline inline __attribute__((always_inline)) -``` - -which means that this function will be always inlined to reduce size of the Linux kernel image. Now let's try to understand implementation of the `set_bit` function. First of all we check a given number of bit at the beginning of the `set_bit` function. The `IS_IMMEDIATE` macro defined in the same [header](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) file and expands to the call of the builtin [gcc](https://en.wikipedia.org/wiki/GNU_Compiler_Collection) function: - -```C -#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) -``` - -The `__builtin_constant_p` builtin function returns `1` if the given parameter is known to be constant at compile-time and returns `0` in other case. We no need to use slow `bts` instruction to set bit if the given number of bit is known in compile time constant. We can just apply [bitwise or](https://en.wikipedia.org/wiki/Bitwise_operation#OR) for byte from the give address which contains given bit and masked number of bits where high bit is `1` and other is zero. In other case if the given number of bit is not known constant at compile-time, we do the same as we did in the `__set_bit` function. The `CONST_MASK_ADDR` macro: - -```C -#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3)) -``` - -expands to the give address with offset to the byte which contains a given bit. For example we have address `0x1000` and the number of bit is `0x9`. So, as `0x9` is `one byte + one bit` our address with be `addr + 1`: - -```python ->>> hex(0x1000 + (0x9 >> 3)) -'0x1001' -``` - -The `CONST_MASK` macro represents our given number of bit as byte where high bit is `1` and other bits are `0`: - -```C -#define CONST_MASK(nr) (1 << ((nr) & 7)) -``` - -```python ->>> bin(1 << (0x9 & 7)) -'0b10' -``` - -In the end we just apply bitwise `or` for these values. So, for example if our address will be `0x4097` and we need to set `0x9` bit: - -```python ->>> bin(0x4097) -'0b100000010010111' ->>> bin((0x4097 >> 0x9) | (1 << (0x9 & 7))) -'0b100010' -``` - -the `ninth` bit will be set. - -Note that all of these operations are marked with `LOCK_PREFIX` which is expands to the [lock](http://x86.renejeschke.de/html/file_module_x86_id_159.html) instruction which guarantees atomicity of this operation. - -As we already know, besides the `set_bit` and `__set_bit` operations, the Linux kernel provides two inverse functions to clear bit in atomic and non-atomic context. They are `clear_bit` and `__clear_bit`. Both of these functions are defined in the same [header file](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) and takes the same set of arguments. But not only arguments are similar. Generally these functions are very similar on the `set_bit` and `__set_bit`. Let's look on the implementation of the non-atomic `__clear_bit` function: - -```C -static inline void __clear_bit(long nr, volatile unsigned long *addr) -{ - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); -} -``` - -Yes. As we see, it takes the same set of arguments and contains very similar block of inline assembler. It just uses the [btr](http://x86.renejeschke.de/html/file_module_x86_id_24.html) instruction instead of `bts`. As we can understand form the function's name, it clears a given bit by the given address. The `btr` instruction acts like `btr`. This instruction also selects a given bit which is specified in the first operand, stores its value in the `CF` flag register and clears this bit in the given bit array which is specifed with second operand. - -The atomic variant of the `__clear_bit` is `clear_bit`: - -```C -static __always_inline void -clear_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "andb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)~CONST_MASK(nr))); - } else { - asm volatile(LOCK_PREFIX "btr %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); - } -} -``` - -and as we can see it is very similar on `set_bit` and just contains two differences. The first difference it uses `btr` instruction to clear bit when the `set_bit` uses `bts` instruction to set bit. The second difference it uses negated mask and `and` instruction to clear bit in the given byte when the `set_bit` uses `or` instruction. - -That's all. Now we can set and clear bit in any bit array and and we can go to other operations on bitmasks. - -Most widely used operations on a bit arrays are set and clear bit in a bit array in the Linux kernel. But besides this operations it is useful to do additional operations on a bit array. Yet another widely used operation in the Linux kernel - is to know is a given bit set or not in a bit array. We can achieve this with the help of the `test_bit` macro. This macro is defined in the [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) header file and expands to the call of the `constant_test_bit` or `variable_test_bit` depends on bit number: - -```C -#define test_bit(nr, addr) \ - (__builtin_constant_p((nr)) \ - ? constant_test_bit((nr), (addr)) \ - : variable_test_bit((nr), (addr))) -``` - -So, if the `nr` is known in compile time constant, the `test_bit` will be expanded to the call of the `constant_test_bit` function or `variable_test_bit` in other case. Now let's look at implementations of these functions. Let's start from the `variable_test_bit`: - -```C -static inline int variable_test_bit(long nr, volatile const unsigned long *addr) -{ - int oldbit; - - asm volatile("bt %2,%1\n\t" - "sbb %0,%0" - : "=r" (oldbit) - : "m" (*(unsigned long *)addr), "Ir" (nr)); - - return oldbit; -} -``` - -The `variable_test_bit` function takes similar set of arguments as `set_bit` and other function take. We also may see inline assembly code here which executes [bt](http://x86.renejeschke.de/html/file_module_x86_id_22.html) and [sbb](http://x86.renejeschke.de/html/file_module_x86_id_286.html) instruction. The `bt` or `bit test` instruction selects a given bit which is specified with first operand from the bit array which is specified with the second operand and stores its value in the [CF](https://en.wikipedia.org/wiki/FLAGS_register) bit of flags register. The second `sbb` instruction substracts first operand from second and subscrtact value of the `CF`. So, here write a value of a given bit number from a given bit array to the `CF` bit of flags register and execute `sbb` instruction which calculates: `00000000 - CF` and writes the result to the `oldbit`. - -The `constant_test_bit` function does the same as we saw in the `set_bit`: - -```C -static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr) -{ - return ((1UL << (nr & (BITS_PER_LONG-1))) & - (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; -} -``` - -It generates a byte where high bit is `1` and other bits are `0` (as we saw in `CONST_MASK`) and applies bitwise [and](https://en.wikipedia.org/wiki/Bitwise_operation#AND) to the byte which contains a given bit number. - -The next widely used bit array related operation is to change bit in a bit array. The Linux kernel provides two helper for this: - -* `__change_bit`; -* `change_bit`. - -As you already can guess, these two variants are atomic and non-atomic as for example `set_bit` and `__set_bit`. For the start, let's look at the implementation of the `__change_bit` function: - -```C -static inline void __change_bit(long nr, volatile unsigned long *addr) -{ - asm volatile("btc %1,%0" : ADDR : "Ir" (nr)); -} -``` - -Pretty easy, is not it? The implementation of the `__change_bit` is the same as `__set_bit`, but instead of `bts` instruction, we are using [btc](http://x86.renejeschke.de/html/file_module_x86_id_23.html). This instruction selects a given bit from a given bit array, stores its value in the `CF` and changes its value by the applying of complement operation. So, a bit with value `1` will be `0` and vice versa: - -```python ->>> int(not 1) -0 ->>> int(not 0) -1 -``` - -The atomic version of the `__change_bit` is the `change_bit` function: - -```C -static inline void change_bit(long nr, volatile unsigned long *addr) -{ - if (IS_IMMEDIATE(nr)) { - asm volatile(LOCK_PREFIX "xorb %1,%0" - : CONST_MASK_ADDR(nr, addr) - : "iq" ((u8)CONST_MASK(nr))); - } else { - asm volatile(LOCK_PREFIX "btc %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); - } -} -``` - -It is similar on `set_bit` function, but also has two differences. The first difference is `xor` operation instead of `or` and the second is `bts` instead of `bts`. - -For this moment we know the most important architecture-specific operations with bit arrays. Time to look at generic bitmap API. - -Common bit operations -================================================================================ - -Besides the architecture-specific API from the [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) header file, the Linux kernel provides common API for manipulation of bit arrays. As we know from the beginning of this part, we can find it in the [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) header file and additionally in the * [lib/bitmap.c](https://github.com/torvalds/linux/blob/master/lib/bitmap.c) source code file. But before these source code files let's look into the [include/linux/bitops.h](https://github.com/torvalds/linux/blob/master/include/linux/bitops.h) header file which provides a set of useful macro. Let's look on some of they. - -First of all let's look at following four macros: - -* `for_each_set_bit` -* `for_each_set_bit_from` -* `for_each_clear_bit` -* `for_each_clear_bit_from` - -All of these macros provide iterator over certain set of bits in a bit array. The first macro iterates over bits which are set, the second does the same, but starts from a certain bits. The last two macros do the same, but iterates over clear bits. Let's look on implementation of the `for_each_set_bit` macro: - -```C -#define for_each_set_bit(bit, addr, size) \ - for ((bit) = find_first_bit((addr), (size)); \ - (bit) < (size); \ - (bit) = find_next_bit((addr), (size), (bit) + 1)) -``` - -As we may see it takes three arguments and expands to the loop from first set bit which is returned as result of the `find_first_bit` function and to the last bit number while it is less than given size. - -Besides these four macros, the [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) provides API for rotation of `64-bit` or `32-bit` values and etc. - -The next [header](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) file which provides API for manipulation with a bit arrays. For example it provdes two functions: - -* `bitmap_zero`; -* `bitmap_fill`. - -To clear a bit array and fill it with `1`. Let's look on the implementation of the `bitmap_zero` function: - -```C -static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) -{ - if (small_const_nbits(nbits)) - *dst = 0UL; - else { - unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); - memset(dst, 0, len); - } -} -``` - -First of all we can see the check for `nbits`. The `small_const_nbits` is macro which defined in the same header [file](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) and looks: - -```C -#define small_const_nbits(nbits) \ - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) -``` - -As we may see it checks that `nbits` is known constant in compile time and `nbits` value does not overflow `BITS_PER_LONG` or `64`. If bits number does not overflow amount of bits in a `long` value we can just set to zero. In other case we need to calculate how many `long` values do we need to fill our bit array and fill it with [memset](http://man7.org/linux/man-pages/man3/memset.3.html). - -The implementation of the `bitmap_fill` function is similar on implementation of the `biramp_zero` function, except we fill a given bit array with `0xff` values or `0b11111111`: - -```C -static inline void bitmap_fill(unsigned long *dst, unsigned int nbits) -{ - unsigned int nlongs = BITS_TO_LONGS(nbits); - if (!small_const_nbits(nbits)) { - unsigned int len = (nlongs - 1) * sizeof(unsigned long); - memset(dst, 0xff, len); - } - dst[nlongs - 1] = BITMAP_LAST_WORD_MASK(nbits); -} -``` - -Besides the `bitmap_fill` and `bitmap_zero` functions, the [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) header file provides `bitmap_copy` which is similar on the `bitmap_zero`, but just uses [memcpy](http://man7.org/linux/man-pages/man3/memcpy.3.html) instead of [memset](http://man7.org/linux/man-pages/man3/memset.3.html). Also it provides bitwise operations for bit array like `bitmap_and`, `bitmap_or`, `bitamp_xor` and etc. We will not consider implementation of these functions because it is easy to understand implementations of these functions if you understood all from this part. Anyway if you are interested how did these function implemented, you may open [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) header file and start to research. - -That's all. - -Links -================================================================================ - -* [bitmap](https://en.wikipedia.org/wiki/Bit_array) -* [linked data structures](https://en.wikipedia.org/wiki/Linked_data_structure) -* [tree data structures](https://en.wikipedia.org/wiki/Tree_%28data_structure%29) -* [hot-plug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) -* [cpumasks](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) -* [IRQs](https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) -* [API](https://en.wikipedia.org/wiki/Application_programming_interface) -* [atomic operations](https://en.wikipedia.org/wiki/Linearizability) -* [xchg instruction](http://x86.renejeschke.de/html/file_module_x86_id_328.html) -* [cmpxchg instruction](http://x86.renejeschke.de/html/file_module_x86_id_41.html) -* [lock instruction](http://x86.renejeschke.de/html/file_module_x86_id_159.html) -* [bts instruction](http://x86.renejeschke.de/html/file_module_x86_id_25.html) -* [btr instruction](http://x86.renejeschke.de/html/file_module_x86_id_24.html) -* [bt instruction](http://x86.renejeschke.de/html/file_module_x86_id_22.html) -* [sbb instruction](http://x86.renejeschke.de/html/file_module_x86_id_286.html) -* [btc instruction](http://x86.renejeschke.de/html/file_module_x86_id_23.html) -* [man memcpy](http://man7.org/linux/man-pages/man3/memcpy.3.html) -* [man memset](http://man7.org/linux/man-pages/man3/memset.3.html) -* [CF](https://en.wikipedia.org/wiki/FLAGS_register) -* [inline assembler](https://en.wikipedia.org/wiki/Inline_assembler) -* [gcc](https://en.wikipedia.org/wiki/GNU_Compiler_Collection) - - ------------------------------------------------------------------------------- - -via: https://github.com/0xAX/linux-insides/blob/master/DataStructures/bitmap.md - -作者:[0xAX][a] -译者:[译者ID](https://github.com/译者ID) -校对:[校对者ID](https://github.com/校对者ID) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]: https://twitter.com/0xAX diff --git a/translated/tech/20160512 Bitmap in Linux Kernel.md b/translated/tech/20160512 Bitmap in Linux Kernel.md new file mode 100644 index 0000000000..6475b9260e --- /dev/null +++ b/translated/tech/20160512 Bitmap in Linux Kernel.md @@ -0,0 +1,405 @@ +--- +date: 2016-07-09 14:42 +status: public +title: 20160512 Bitmap in Linux Kernel +--- + +Linux 内核里的数据结构 +================================================================================ + +Linux 内核中的位数组和位操作 +-------------------------------------------------------------------------------- + +除了不同的基于[链式](https://en.wikipedia.org/wiki/Linked_data_structure)和[树](https://en.wikipedia.org/wiki/Tree_%28data_structure%29)的数据结构以外,Linux 内核也为[位数组](https://en.wikipedia.org/wiki/Bit_array)或`位图`提供了 [API](https://en.wikipedia.org/wiki/Application_programming_interface)。位数组在 Linux 内核里被广泛使用,并且在以下的源代码文件中包含了与这样的结构搭配使用的通用 `API`: + +* [lib/bitmap.c](https://github.com/torvalds/linux/blob/master/lib/bitmap.c) +* [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) + +除了这两个文件之外,还有体系结构特定的头文件,它们为特定的体系结构提供优化的位操作。我们将探讨 [x86_64](https://en.wikipedia.org/wiki/X86-64) 体系结构,因此在我们的例子里,它会是 + +* [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) + +头文件。正如我上面所写的,`位图`在 Linux 内核中被广泛地使用。例如,`位数组`常常用于保存一组在线/离线处理器,以便系统支持[热插拔](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt)的 CPU(你可以在 [cpumasks](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) 部分阅读更多相关知识 ),一个`位数组`可以在 Linux 内核初始化等期间保存一组已分配的中断处理。 + +因此,本部分的主要目的是了解位数组是如何在 Linux 内核中实现的。让我们现在开始吧。 + +位数组声明 +================================================================================ + +在我们开始查看位图操作的 `API` 之前,我们必须知道如何在 Linux 内核中声明它。有两中通用的方法声明位数组。第一种简单的声明一个位数组的方法是,定义一个 unsigned long 的数组,例如: + +```C +unsigned long my_bitmap[8] +``` + +第二种方法,是使用 `DECLARE_BITMAP` 宏,它定义于 [include/linux/types.h](https://github.com/torvalds/linux/blob/master/include/linux/types.h) 头文件: + +```C +#define DECLARE_BITMAP(name,bits) \ + unsigned long name[BITS_TO_LONGS(bits)] +``` + +我们可以看到 `DECLARE_BITMAP` 宏使用两个参数: + +* `name` - 位图名称; +* `bits` - 位图中位数; + +并且只是使用 `BITS_TO_LONGS(bits)` 元素展开 `unsigned long` 数组的定义。 `BITS_TO_LONGS` 宏将一个给定的位数转换为 `longs` 的个数,换言之,就是计算 `bits` 中有多少个 `8` 字节元素: + +```C +#define BITS_PER_BYTE 8 +#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) +#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) +``` + +因此,例如 `DECLARE_BITMAP(my_bitmap, 64)` 将产生: + +```python +>>> (((64) + (64) - 1) / (64)) +1 +``` + +与: + +```C +unsigned long my_bitmap[1]; +``` + +在能够声明一个位数组之后,我们便可以使用它了。 + +体系结构特定的位操作 +================================================================================ + +我们已经看了以上一对源文件和头文件,它们提供了位数组操作的 [API](https://en.wikipedia.org/wiki/Application_programming_interface)。其中重要且广泛使用的位数组 API 是体系结构特定的且位于已提及的头文件中 [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h)。 + +首先让我们查看两个最重要的函数: + +* `set_bit`; +* `clear_bit`. + +我认为没有必要解释这些函数的作用。从它们的名字来看,这已经很清楚了。让我们直接查看它们的实现。如果你浏览 [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) 头文件,你将会注意到这些函数中的每一个都有[原子性](https://en.wikipedia.org/wiki/Linearizability)和非原子性两种变体。在我们开始深入这些函数的实现之前,首先,我们必须了解一些有关原子操作的知识。 + +简而言之,原子操作保证两个或以上的操作不会并发地执行同一数据。`x86` 体系结构提供了一系列原子指令,例如, [xchg](http://x86.renejeschke.de/html/file_module_x86_id_328.html)、[cmpxchg](http://x86.renejeschke.de/html/file_module_x86_id_41.html) 等指令。除了原子指令,一些非原子指令可以在 [lock](http://x86.renejeschke.de/html/file_module_x86_id_159.html) 指令的帮助下具有原子性。目前已经对原子操作有了充分的理解,我们可以接着探讨 `set_bit` 和 `clear_bit` 函数的实现。 + +我们先考虑函数的非原子性变体。非原子性的 `set_bit` 和 `clear_bit` 的名字以双下划线开始。正如我们所知道的,所有这些函数都定义于 [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) 头文件,并且第一个函数就是 `__set_bit`: + +```C +static inline void __set_bit(long nr, volatile unsigned long *addr) +{ + asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); +} +``` + +正如我们所看到的,它使用了两个参数: + +* `nr` - 位数组中的位号(从0开始,译者注) +* `addr` - 我们需要置位的位数组地址 + +注意,`addr` 参数使用 `volatile` 关键字定义,以告诉编译器给定地址指向的变量可能会被修改。 `__set_bit` 的实现相当简单。正如我们所看到的,它仅包含一行[内联汇编代码](https://en.wikipedia.org/wiki/Inline_assembler)。在我们的例子中,我们使用 [bts](http://x86.renejeschke.de/html/file_module_x86_id_25.html) 指令,从位数组中选出一个第一操作数(我们的例子中的 `nr`),存储选出的位的值到 [CF](https://en.wikipedia.org/wiki/FLAGS_register) 标志寄存器并设置该位(即 `nr` 指定的位置为1,译者注)。 + +注意,我们了解了 `nr` 的用法,但这里还有一个参数 `addr` 呢!你或许已经猜到秘密就在 `ADDR`。 `ADDR` 是一个定义在同一头文件的宏,它展开为一个包含给定地址和 `+m` 约束的字符串: + +```C +#define ADDR BITOP_ADDR(addr) +#define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) +``` + +除了 `+m` 之外,在 `__set_bit` 函数中我们可以看到其他约束。让我们查看并试图理解它们所表示的意义: + +* `+m` - 表示内存操作数,这里的 `+` 表明给定的操作数为输入输出操作数; +* `I` - 表示整型常量; +* `r` - 表示寄存器操作数 + +除了这些约束之外,我们也能看到 `memory` 关键字,其告诉编译器这段代码会修改内存中的变量。到此为止,现在我们看看相同的原子性变体函数。它看起来比非原子性变体更加复杂: + +```C +static __always_inline void +set_bit(long nr, volatile unsigned long *addr) +{ + if (IS_IMMEDIATE(nr)) { + asm volatile(LOCK_PREFIX "orb %1,%0" + : CONST_MASK_ADDR(nr, addr) + : "iq" ((u8)CONST_MASK(nr)) + : "memory"); + } else { + asm volatile(LOCK_PREFIX "bts %1,%0" + : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); + } +} +``` + +(BITOP_ADDR 的定义为:`#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))`,ORB 为字节按位或,译者注) + +首先注意,这个函数使用了与 `__set_bit` 相同的参数集合,但额外地使用了 `__always_inline` 属性标记。 `__always_inline` 是一个定义于 [include/linux/compiler-gcc.h](https://github.com/torvalds/linux/blob/master/include/linux/compiler-gcc.h) 的宏,并且只是展开为 `always_inline` 属性: + +```C +#define __always_inline inline __attribute__((always_inline)) +``` + +其意味着这个函数总是内联的,以减少 Linux 内核映像的大小。现在我们试着了解 `set_bit` 函数的实现。首先我们在 `set_bit` 函数的开头检查给定的位数量。`IS_IMMEDIATE` 宏定义于相同[头文件](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h),并展开为 gcc 内置函数的调用: + +```C +#define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) +``` + +如果给定的参数是编译期已知的常量,`__builtin_constant_p` 内置函数则返回 `1`,其他情况返回 `0`。假若给定的位数是编译期已知的常量,我们便无须使用效率低下的 `bts` 指令去设置位。我们可以只需在给定地址指向的字节和和掩码上执行 [按位或](https://en.wikipedia.org/wiki/Bitwise_operation#OR) 操作,其字节包含给定的位,而掩码为位号高位 `1`,其他位为 0。在其他情况下,如果给定的位号不是编译期已知常量,我们便做和 `__set_bit` 函数一样的事。`CONST_MASK_ADDR` 宏: + +```C +#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3)) +``` + +展开为带有到包含给定位的字节偏移的给定地址,例如,我们拥有地址 `0x1000` 和 位号是 `0x9`。因为 `0x9` 是 `一个字节 + 一位`,所以我们的地址是 `addr + 1`: + +```python +>>> hex(0x1000 + (0x9 >> 3)) +'0x1001' +``` + +`CONST_MASK` 宏将我们给定的位号表示为字节,位号对应位为高位 `1`,其他位为 `0`: + +```C +#define CONST_MASK(nr) (1 << ((nr) & 7)) +``` + +```python +>>> bin(1 << (0x9 & 7)) +'0b10' +``` + +最后,我们应用 `按位或` 运算到这些变量上面,因此,假如我们的地址是 `0x4097` ,并且我们需要置位号为 `9` 的位 为 1: + +```python +>>> bin(0x4097) +'0b100000010010111' +>>> bin((0x4097 >> 0x9) | (1 << (0x9 & 7))) +'0b100010' +``` + +`第 9 位` 将会被置位。(这里的 9 是从 0 开始计数的,比如0010,按照作者的意思,其中的 1 是第 1 位,译者注) + +注意,所有这些操作使用 `LOCK_PREFIX` 标记,其展开为 [lock](http://x86.renejeschke.de/html/file_module_x86_id_159.html) 指令,保证该操作的原子性。 + +正如我们所知,除了 `set_bit` 和 `__set_bit` 操作之外,Linux 内核还提供了两个功能相反的函数,在原子性和非原子性的上下文中清位。它们为 `clear_bit` 和 `__clear_bit`。这两个函数都定义于同一个[头文件](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) 并且使用相同的参数集合。不仅参数相似,一般而言,这些函数与 `set_bit` 和 `__set_bit` 也非常相似。让我们查看非原子性 `__clear_bit` 的实现吧: + +```C +static inline void __clear_bit(long nr, volatile unsigned long *addr) +{ + asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); +} +``` + +没错,正如我们所见,`__clear_bit` 使用相同的参数集合,并包含极其相似的内联汇编代码块。它仅仅使用 [btr](http://x86.renejeschke.de/html/file_module_x86_id_24.html) 指令替换 `bts`。正如我们从函数名所理解的一样,通过给定地址,它清除了给定的位。`btr` 指令表现得像 `bts`(原文这里为 btr,可能为笔误,修正为 bts,译者注)。该指令选出第一操作数指定的位,存储它的值到 `CF` 标志寄存器,并且清楚第二操作数指定的位数组中的对应位。 + +`__clear_bit` 的原子性变体为 `clear_bit`: + +```C +static __always_inline void +clear_bit(long nr, volatile unsigned long *addr) +{ + if (IS_IMMEDIATE(nr)) { + asm volatile(LOCK_PREFIX "andb %1,%0" + : CONST_MASK_ADDR(nr, addr) + : "iq" ((u8)~CONST_MASK(nr))); + } else { + asm volatile(LOCK_PREFIX "btr %1,%0" + : BITOP_ADDR(addr) + : "Ir" (nr)); + } +} +``` + +并且正如我们所看到的,它与 `set_bit` 非常相似,同时只包含了两处差异。第一处差异为 `clear_bit` 使用 `btr` 指令来清位,而 `set_bit` 使用 `bts` 指令来置位。第二处差异为 `clear_bit` 使用否定的位掩码和 `按位与` 在给定的字节上置位,而 `set_bit` 使用 `按位或` 指令。 + +到此为止,我们可以在任何位数组置位和清位了,并且能够转到位掩码上的其他操作。 + +在 Linux 内核位数组上最广泛使用的操作是设置和清除位,但是除了这两个操作外,位数组上其他操作也是非常有用的。Linux 内核里另一种广泛使用的操作是知晓位数组中一个给定的位是否被置位。我们能够通过 `test_bit` 宏的帮助实现这一功能。这个宏定义于 [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) 头文件,并展开为 `constant_test_bit` 或 `variable_test_bit` 的调用,这要取决于位号。 + +```C +#define test_bit(nr, addr) \ + (__builtin_constant_p((nr)) \ + ? constant_test_bit((nr), (addr)) \ + : variable_test_bit((nr), (addr))) +``` + +因此,如果 `nr` 是编译期已知常量,`test_bit` 将展开为 `constant_test_bit` 函数的调用,而其他情况则为 `variable_test_bit`。现在让我们看看这些函数的实现,我们从 `variable_test_bit` 开始看起: + +```C +static inline int variable_test_bit(long nr, volatile const unsigned long *addr) +{ + int oldbit; + + asm volatile("bt %2,%1\n\t" + "sbb %0,%0" + : "=r" (oldbit) + : "m" (*(unsigned long *)addr), "Ir" (nr)); + + return oldbit; +} +``` + +`variable_test_bit` 函数调用了与 `set_bit` 及其他函数使用的相似的参数集合。我们也可以看到执行 [bt](http://x86.renejeschke.de/html/file_module_x86_id_22.html) 和 [sbb](http://x86.renejeschke.de/html/file_module_x86_id_286.html) 指令的内联汇编代码。`bt` 或 `bit test` 指令从第二操作数指定的位数组选出第一操作数指定的一个指定位,并且将该位的值存进标志寄存器的 [CF](https://en.wikipedia.org/wiki/FLAGS_register) 位。第二个指令 `sbb` 从第二操作数中减去第一操作数,再减去 `CF` 的值。因此,这里将一个从给定位数组中的给定位号的值写进标志寄存器的 `CF` 位,并且执行 `sbb` 指令计算: `00000000 - CF`,并将结果写进 `oldbit` 变量。 + +`constant_test_bit` 函数做了和我们在 `set_bit` 所看到的一样的事: + +```C +static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr) +{ + return ((1UL << (nr & (BITS_PER_LONG-1))) & + (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; +} +``` + +它生成了一个位号对应位为高位 `1`,而其他位为 `0` 的字节(正如我们在 `CONST_MASK` 所看到的),并将 [按位与](https://en.wikipedia.org/wiki/Bitwise_operation#AND) 应用于包含给定位号的字节。 + +下一广泛使用的位数组相关操作是改变一个位数组中的位。为此,Linux 内核提供了两个辅助函数: + +* `__change_bit`; +* `change_bit`. + +你可能已经猜测到,就拿 `set_bit` 和 `__set_bit` 例子说,这两个变体分别是原子和非原子版本。首先,让我们看看 `__change_bit` 函数的实现: + +```C +static inline void __change_bit(long nr, volatile unsigned long *addr) +{ + asm volatile("btc %1,%0" : ADDR : "Ir" (nr)); +} +``` + +相当简单,不是吗? `__change_bit` 的实现和 `__set_bit` 一样,只是我们使用 [btc](http://x86.renejeschke.de/html/file_module_x86_id_23.html) 替换 `bts` 指令而已。 该指令从一个给定位数组中选出一个给定位,将该为位的值存进 `CF` 并使用求反操作改变它的值,因此值为 `1` 的位将变为 `0`,反之亦然: + +```python +>>> int(not 1) +0 +>>> int(not 0) +1 +``` + + `__change_bit` 的原子版本为 `change_bit` 函数: + +```C +static inline void change_bit(long nr, volatile unsigned long *addr) +{ + if (IS_IMMEDIATE(nr)) { + asm volatile(LOCK_PREFIX "xorb %1,%0" + : CONST_MASK_ADDR(nr, addr) + : "iq" ((u8)CONST_MASK(nr))); + } else { + asm volatile(LOCK_PREFIX "btc %1,%0" + : BITOP_ADDR(addr) + : "Ir" (nr)); + } +} +``` + +它和 `set_bit` 函数很相似,但也存在两点差异。第一处差异为 `xor` 操作而不是 `or`。第二处差异为 `btc`(原文为 `bts`,为作者笔误,译者注) 而不是 `bts`。 + +目前,我们了解了最重要的体系特定的位数组操作,是时候看看一般的位图 API 了。 + +通用位操作 +================================================================================ + +除了 [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) 中体系特定的 API 外,Linux 内核提供了操作位数组的通用 API。正如我们本部分开头所了解的一样,我们可以在 [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) 头文件和* [lib/bitmap.c](https://github.com/torvalds/linux/blob/master/lib/bitmap.c) 源文件中找到它。但在查看这些源文件之前,我们先看看 [include/linux/bitops.h](https://github.com/torvalds/linux/blob/master/include/linux/bitops.h) 头文件,其提供了一系列有用的宏,让我们看看它们当中一部分。 + +首先我们看看以下 4 个 宏: + +* `for_each_set_bit` +* `for_each_set_bit_from` +* `for_each_clear_bit` +* `for_each_clear_bit_from` + +所有这些宏都提供了遍历位数组中某些位集合的迭代器。第一个红迭代那些被置位的位。第二个宏也是一样,但它是从某一确定位开始。最后两个宏做的一样,但是迭代那些被清位的位。让我们看看 `for_each_set_bit` 宏: + +```C +#define for_each_set_bit(bit, addr, size) \ + for ((bit) = find_first_bit((addr), (size)); \ + (bit) < (size); \ + (bit) = find_next_bit((addr), (size), (bit) + 1)) +``` + +正如我们所看到的,它使用了三个参数,并展开为一个循环,该循环从作为 `find_first_bit` 函数返回结果的第一个置位开始到最后一个置位且小于给定大小为止。 + +除了这四个宏, [arch/x86/include/asm/bitops.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/bitops.h) 也提供了 `64-bit` 或 `32-bit` 变量循环的 API 等等。 + +下一个 [头文件](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) 提供了操作位数组的 API。例如,它提供了以下两个函数: + +* `bitmap_zero`; +* `bitmap_fill`. + +它们分别可以清除一个位数组和用 `1` 填充位数组。让我们看看 `bitmap_zero` 函数的实现: + +```C +static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) +{ + if (small_const_nbits(nbits)) + *dst = 0UL; + else { + unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); + memset(dst, 0, len); + } +} +``` + +首先我们可以看到对 `nbits` 的检查。 `small_const_nbits` 是一个定义在同一[头文件](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) 的宏: + +```C +#define small_const_nbits(nbits) \ + (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) +``` + +正如我们可以看到的,它检查 `nbits` 是否为编译期已知常量,并且其值不超过 `BITS_PER_LONG` 或 `64`。如果位数目没有超过一个 `long` 变量的位数,我们可以仅仅设置为 0。在其他情况,我们需要计算有多少个需要填充位数组的 `long` 变量并且使用 [memset](http://man7.org/linux/man-pages/man3/memset.3.html) 进行填充。 + +`bitmap_fill` 函数的实现和 `biramp_zero` 函数很相似,除了我们需要在给定的位数组中填写 `0xff` 或 `0b11111111`: + +```C +static inline void bitmap_fill(unsigned long *dst, unsigned int nbits) +{ + unsigned int nlongs = BITS_TO_LONGS(nbits); + if (!small_const_nbits(nbits)) { + unsigned int len = (nlongs - 1) * sizeof(unsigned long); + memset(dst, 0xff, len); + } + dst[nlongs - 1] = BITMAP_LAST_WORD_MASK(nbits); +} +``` + +除了 `bitmap_fill` 和 `bitmap_zero`,[include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) 头文件也提供了和 `bitmap_zero` 很相似的 `bitmap_copy`,只是仅仅使用 [memcpy](http://man7.org/linux/man-pages/man3/memcpy.3.html) 而不是 [memset](http://man7.org/linux/man-pages/man3/memset.3.html) 这点差异而已。它也提供了位数组的按位操作,像 `bitmap_and`, `bitmap_or`, `bitamp_xor`等等。我们不会探讨这些函数的实现了,因为如果你理解了本部分的所有内容,这些函数的实现是很容易理解的。无论如何,如果你对这些函数是如何实现的感兴趣,你可以打开并研究 [include/linux/bitmap.h](https://github.com/torvalds/linux/blob/master/include/linux/bitmap.h) 头文件。 + +本部分到此为止。 + +链接 +================================================================================ + +* [bitmap](https://en.wikipedia.org/wiki/Bit_array) +* [linked data structures](https://en.wikipedia.org/wiki/Linked_data_structure) +* [tree data structures](https://en.wikipedia.org/wiki/Tree_%28data_structure%29) +* [hot-plug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) +* [cpumasks](https://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) +* [IRQs](https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) +* [API](https://en.wikipedia.org/wiki/Application_programming_interface) +* [atomic operations](https://en.wikipedia.org/wiki/Linearizability) +* [xchg instruction](http://x86.renejeschke.de/html/file_module_x86_id_328.html) +* [cmpxchg instruction](http://x86.renejeschke.de/html/file_module_x86_id_41.html) +* [lock instruction](http://x86.renejeschke.de/html/file_module_x86_id_159.html) +* [bts instruction](http://x86.renejeschke.de/html/file_module_x86_id_25.html) +* [btr instruction](http://x86.renejeschke.de/html/file_module_x86_id_24.html) +* [bt instruction](http://x86.renejeschke.de/html/file_module_x86_id_22.html) +* [sbb instruction](http://x86.renejeschke.de/html/file_module_x86_id_286.html) +* [btc instruction](http://x86.renejeschke.de/html/file_module_x86_id_23.html) +* [man memcpy](http://man7.org/linux/man-pages/man3/memcpy.3.html) +* [man memset](http://man7.org/linux/man-pages/man3/memset.3.html) +* [CF](https://en.wikipedia.org/wiki/FLAGS_register) +* [inline assembler](https://en.wikipedia.org/wiki/Inline_assembler) +* [gcc](https://en.wikipedia.org/wiki/GNU_Compiler_Collection) + + +------------------------------------------------------------------------------ + +via: https://github.com/0xAX/linux-insides/blob/master/DataStructures/bitmap.md + +作者:[0xAX][a] +译者:[cposture](https://github.com/cposture) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://twitter.com/0xAX \ No newline at end of file From b1cdc7fcbe2fee9dbb5ad3264d19e028d9f3b818 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 9 Jul 2016 21:46:14 +0800 Subject: [PATCH 02/17] Translating by cposture --- .../tech/20160705 Create Your Own Shell in Python - Part I.md | 1 + .../tech/20160706 Create Your Own Shell in Python - Part II.md | 1 + 2 files changed, 2 insertions(+) diff --git a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md index 9c3ffa55dd..48e84381c8 100644 --- a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md +++ b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md @@ -1,3 +1,4 @@ +Translating by cposture 2016.07.09 Create Your Own Shell in Python : Part I I’m curious to know how a shell (like bash, csh, etc.) works internally. So, I implemented one called yosh (Your Own SHell) in Python to answer my own curiosity. The concept I explain in this article can be applied to other languages as well. diff --git a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md index af0ec01b36..3154839443 100644 --- a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md +++ b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md @@ -1,3 +1,4 @@ +Translating by cposture 2016.07.09 Create Your Own Shell in Python - Part II =========================================== From 93bb30d7e4d0f40f711aa8304fbbfa389fba163d Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Tue, 12 Jul 2016 00:41:32 +0800 Subject: [PATCH 03/17] translating partly --- ...reate Your Own Shell in Python - Part I.md | 41 ++++++++++--------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md index 48e84381c8..0b7e415f2a 100644 --- a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md +++ b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md @@ -1,15 +1,15 @@ -Translating by cposture 2016.07.09 -Create Your Own Shell in Python : Part I +使用 Python 创建你自己的 Shell:Part I +========================================== -I’m curious to know how a shell (like bash, csh, etc.) works internally. So, I implemented one called yosh (Your Own SHell) in Python to answer my own curiosity. The concept I explain in this article can be applied to other languages as well. +我很好奇一个 shell (像 bash,csh 等)内部是如何工作的。为了满足自己的好奇心,我使用 Python 实现了一个名为 yosh (Your Own Shell)的 Shell。本文章所介绍的概念也可以应用于其他编程语言。 -(Note: You can find source code used in this blog post here. I distribute it with MIT license.) +(提示:你可以发布于此的博文中找到使用的源代码,代码以 MIT 许可发布) -Let’s start. +让我们开始吧。 -### Step 0: Project Structure +### 步骤 0:项目结构 -For this project, I use the following project structure. +对于此项目,我使用了以下的项目结构。 ``` yosh_project @@ -18,17 +18,17 @@ yosh_project |-- shell.py ``` -`yosh_project` is the root project folder (you can also name it just `yosh`). +`yosh_project` 为项目根目录(你也可以把它简单地命名为 `yosh`)。 -`yosh` is the package folder and `__init__.py` will make it a package named the same as the package folder name. (If you don’t write Python, just ignore it.) +`yosh` 为包目录,并且 `__init__.py` 将会使一个包名等同于包目录名字(如果你不写 Python,可以忽略它) -`shell.py` is our main shell file. +`shell.py` 是我们的主脚本文件。 -### Step 1: Shell Loop +### 步骤 1:Shell 循环 -When you start a shell, it will show a command prompt and wait for your command input. After it receives the command and executes it (the detail will be explained later), your shell will be back to the wait loop for your next command. +当你启动一个 shell,它会显示一个命令提示符同时等待用户输入命令。在接收了输入的命令并执行它之后(稍后文章会进行详细解释),你的 shell 会回到循环,等待下一条指令。 -In `shell.py`, we start by a simple main function calling the shell_loop() function as follows: +在 `shell.py`,我们会以一个简单的 mian 函数开始,该函数调用了 shell_loop() 函数,如下: ``` def shell_loop(): @@ -43,7 +43,7 @@ if __name__ == "__main__": main() ``` -Then, in our `shell_loop()`, we use a status flag to indicate whether the loop should continue or stop. In the beginning of the loop, our shell will show a command prompt and wait to read command input. +接着,在 `shell_loop()`,为了指示循环是否继续或停止,我们使用了一个状态标志。在循环的开始,我们的 shell 将显示一个命令提示符,并等待读取命令输入。 ``` import sys @@ -64,9 +64,9 @@ def shell_loop(): cmd = sys.stdin.readline() ``` -After that, we tokenize the command input and execute it (we’ll implement the tokenize and execute functions soon). +之后,我们切分命令输入并进行执行(我们将马上解释命令切分和执行函数)。 -Therefore, our shell_loop() will be the following. +因此,我们的 shell_loop() 会是如下这样: ``` import sys @@ -93,14 +93,15 @@ def shell_loop(): status = execute(cmd_tokens) ``` -That’s all of our shell loop. If we start our shell with python shell.py, it will show the command prompt. However, it will throw an error if we type a command and hit enter because we don’t define tokenize function yet. +这就是我们整个 shell 循环。如果我们使用 python shell.py 命令启动 shell,它会显示命令提示符。然而如果我们输入命令并按回车,它将会抛出错误,因为我们还没定义命令切分函数。 -To exit the shell, try ctrl-c. I will tell how to exit gracefully later. +为了退出 shell,可以尝试输入 ctrl-c。稍后我将解释如何以优雅的形式退出 shell。 -### Step 2: Tokenization +### 步骤 2:命令切分 -When a user types a command in our shell and hits enter. The command input will be a long string containing both a command name and its arguments. Therefore, we have to tokenize it (split a string into several tokens). +当一个用户在我们的 shell 中输入命令并按下回车键,该命令将会是一个包含命令名称及其参数的很长的字符串。因此,我们必须切分该字符串(分割一个字符串为多个标记)。 +咋一看它似乎很简单。我们或许可以使用 cmd.split(),用空格分割输入。 It seems simple at first glance. We might use cmd.split() to separate the input by spaces. It works well for a command like `ls -a my_folder` because it splits the command into a list `['ls', '-a', 'my_folder']` which we can use them easily. However, there are some cases that some arguments are quoted with single or double quotes like `echo "Hello World"` or `echo 'Hello World'`. If we use cmd.split(), we will get a list of 3 tokens `['echo', '"Hello', 'World"']` instead of 2 tokens `['echo', 'Hello World']`. From bfa39b05dd36e7fbe3b05ba3a25aabca7f47f999 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Wed, 13 Jul 2016 09:27:55 +0800 Subject: [PATCH 04/17] translating partly 75 --- ...reate Your Own Shell in Python - Part I.md | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md index 0b7e415f2a..67e5809b3d 100644 --- a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md +++ b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md @@ -101,12 +101,12 @@ def shell_loop(): 当一个用户在我们的 shell 中输入命令并按下回车键,该命令将会是一个包含命令名称及其参数的很长的字符串。因此,我们必须切分该字符串(分割一个字符串为多个标记)。 -咋一看它似乎很简单。我们或许可以使用 cmd.split(),用空格分割输入。 -It seems simple at first glance. We might use cmd.split() to separate the input by spaces. It works well for a command like `ls -a my_folder` because it splits the command into a list `['ls', '-a', 'my_folder']` which we can use them easily. +咋一看似乎很简单。我们或许可以使用 cmd.split(),用空格分割输入。它对类似 `ls -a my_folder` 的命令起作用,因为它能够将命令分割为一个列表 `['ls', '-a', 'my_folder']`,这样我们便能轻易处理它们了。 -However, there are some cases that some arguments are quoted with single or double quotes like `echo "Hello World"` or `echo 'Hello World'`. If we use cmd.split(), we will get a list of 3 tokens `['echo', '"Hello', 'World"']` instead of 2 tokens `['echo', 'Hello World']`. +然而,也有一些类似 `echo "Hello World"` 或 `echo 'Hello World'` 以单引号或双引号引用参数的情况。如果我们使用 cmd.spilt,我们将会得到一个存有 3 个标记的列表 `['echo', '"Hello', 'World"']` 而不是 2 个标记 `['echo', 'Hello World']`。 + +幸运的是,Python 提供了一个名为 shlex 的库,能够帮助我们效验如神地分割命令。(提示:我们也可以使用正则表达式,但它不是本文的重点。) -Fortunately, Python provides a library called shlex that helps us split like a charm. (Note: we can also use regular expression but it’s not the main point of this article.) ``` import sys @@ -120,13 +120,13 @@ def tokenize(string): ... ``` -Then, we will send these tokens to the execution process. +然后我们将这些标记发送到执行过程。 -### Step 3: Execution +### 步骤 3:执行 -This is the core and fun part of a shell. What happened when a shell executes mkdir test_dir? (Note: mkdir is a program to be executed with arguments test_dir for creating a directory named test_dir.) +这是 shell 中核心和有趣的一部分。当 shell 执行 mkdir test_dir 时,发生了什么?(提示:midir 是一个带有 test_dir 参数的执行程序,用于创建一个名为 test_dir 的目录。) -The first function involved in this step is execvp. Before I explain what execvp does, let’s see it in action. +execvp 是涉及这一步的首个函数。在我们解释 execvp 所做的事之前,让我们看看它的实际效果。 ``` import os @@ -142,15 +142,16 @@ def execute(cmd_tokens): ... ``` -Try running our shell again and input a command `mkdir test_dir`, then, hit enter. +再次尝试运行我们的 shell,并输入 `mkdir test_dir` 命令,接着按下回车键。 -The problem is, after we hit enter, our shell exits instead of waiting for the next command. However, the directory is correctly created. +在我们敲下回车键之后,问题是我们的 shell 会直接退出而不是等待下一个命令。然而,目标正确地被创建。 -So, what execvp really does? +因此,execvp 实际上做了什么? -execvp is a variant of a system call exec. The first argument is the program name. The v indicates the second argument is a list of program arguments (variable number of arguments). The p indicates the PATH environment will be used for searching for the given program name. In our previous attempt, the mkdir program was found based on your PATH environment variable. +execvp 是系统调用 exec 的一个变体。第一个参数是程序名字。v 表示第二个参数是一个程序参数列表(可变参数)。p 表示环境变量 PATH 会被用于搜索给定的程序名字。在我们上一次的尝试中,可以在你的 PATH 环境变量查找到 mkdir 程序。 + +(还有其他 exec 变体,比如 execv、execvpe、execl、execlp、execlpe;你可以 google 它们获取更多的信息。) -(There are other variants of exec such as execv, execvpe, execl, execlp, execlpe; you can google them for more information.) exec replaces the current memory of a calling process with a new process to be executed. In our case, our shell process memory was replaced by `mkdir` program. Then, mkdir became the main process and created the test_dir directory. Finally, its process exited. From 97bdb52dc246b07480649ea4d70b6c0df55f0e96 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Wed, 13 Jul 2016 13:48:03 +0800 Subject: [PATCH 05/17] Translated by cposture --- ...reate Your Own Shell in Python - Part I.md | 31 +++++++++---------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md index 67e5809b3d..74cac3887e 100644 --- a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md +++ b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md @@ -152,16 +152,15 @@ execvp 是系统调用 exec 的一个变体。第一个参数是程序名字。v (还有其他 exec 变体,比如 execv、execvpe、execl、execlp、execlpe;你可以 google 它们获取更多的信息。) +exec 会用即将运行的新进程替换调用进程的当前内存。在我们的例子中,我们的 shell 进程内存会被替换为 `mkdir` 程序。接着,mkdir 成为主进程并创建 test_dir 目录。最后该进程退出。 -exec replaces the current memory of a calling process with a new process to be executed. In our case, our shell process memory was replaced by `mkdir` program. Then, mkdir became the main process and created the test_dir directory. Finally, its process exited. +这里的重点在于我们的 shell 进程已经被 mkdir 进程所替换。这就是我们的 shell 消失且不会等待下一条命令的原因。 -The main point here is that our shell process was replaced by mkdir process already. That’s the reason why our shell disappeared and did not wait for the next command. +因此,我们需要其他的系统调用来解决问题:fork -Therefore, we need another system call to rescue: fork. +fork 会开辟新的内存并拷贝当前进程到一个新的进程。我们称这个新的进程为子进程,调用者进程为父进程。然后,子进程内存会被替换为被执行的程序。因此,我们的 shell,也就是父进程,可以免受内存替换的危险。 -fork will allocate new memory and copy the current process into a new process. We called this new process as child process and the caller process as parent process. Then, the child process memory will be replaced by a execed program. Therefore, our shell, which is a parent process, is safe from memory replacement. - -Let’s see our modified code. +让我们看看已修改的代码。 ``` ... @@ -194,25 +193,25 @@ def execute(cmd_tokens): ... ``` -When the parent process call `os.fork()`, you can imagine that all source code is copied into a new child process. At this point, the parent and child process see the same code and run in parallel. +当我们的父进程调用 `os.fork()`时,你可以想象所有的源代码被拷贝到了新的子进程。此时此刻,父进程和子进程看到的是相同的代码,并且并行运行着。 -If the running code is belong to the child process, pid will be 0. Else, the running code is belong to the parent process, pid will be the process id of the child process. +如果运行的代码属于子进程,pid 将为 0。否则,如果运行的代码属于父进程,pid 将会是子进程的进程 id。 -When os.execvp is invoked in the child process, you can imagine like all the source code of the child process is replaced by the code of a program that is being called. However, the code of the parent process is not changed. +当 os.execvp 在子进程中被调用时,你可以想象子进程的所有源代码被替换为正被调用程序的代码。然而父进程的代码不会被改变。 -When the parent process finishes waiting its child process to exit or be terminated, it returns the status indicating to continue the shell loop. +当父进程完成等待子进程退出或终止时,它会返回一个状态,指示继续 shell 循环。 -### Run +### 运行 -Now, you can try running our shell and enter mkdir test_dir2. It should work properly. Our main shell process is still there and waits for the next command. Try ls and you will see the created directories. +现在,你可以尝试运行我们的 shell 并输入 mkdir test_dir2。它应该可以正确执行。我们的主 shell 进程仍然存在并等待下一条命令。尝试执行 ls,你可以看到已创建的目录。 -However, there are some problems here. +但是,这里仍有许多问题。 -First, try cd test_dir2 and then ls. It’s supposed to enter the directory test_dir2 which is an empty directory. However, you will see that the directory was not changed into test_dir2. +第一,尝试执行 cd test_dir2,接着执行 ls。它应该会进入到一个空的 test_dir2 目录。然而,你将会看到目录没有变为 test_dir2。 -Second, we still have no way to exit from our shell gracefully. +第二,我们仍然没有办法优雅地退出我们的 shell。 -We will continue to solve such problems in [Part 2][1]. +我们将会在 [Part 2][1] 解决诸如此类的问题。 -------------------------------------------------------------------------------- From c544898d9927eb375711356a35cf74c079abe39f Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Wed, 13 Jul 2016 15:03:42 +0800 Subject: [PATCH 06/17] Translated by cposture --- ...reate Your Own Shell in Python - Part I.md | 228 ------------------ ...reate Your Own Shell in Python - Part I.md | 228 ++++++++++++++++++ 2 files changed, 228 insertions(+), 228 deletions(-) delete mode 100644 sources/tech/20160705 Create Your Own Shell in Python - Part I.md create mode 100644 translated/tech/20160705 Create Your Own Shell in Python - Part I.md diff --git a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md b/sources/tech/20160705 Create Your Own Shell in Python - Part I.md deleted file mode 100644 index 74cac3887e..0000000000 --- a/sources/tech/20160705 Create Your Own Shell in Python - Part I.md +++ /dev/null @@ -1,228 +0,0 @@ -使用 Python 创建你自己的 Shell:Part I -========================================== - -我很好奇一个 shell (像 bash,csh 等)内部是如何工作的。为了满足自己的好奇心,我使用 Python 实现了一个名为 yosh (Your Own Shell)的 Shell。本文章所介绍的概念也可以应用于其他编程语言。 - -(提示:你可以发布于此的博文中找到使用的源代码,代码以 MIT 许可发布) - -让我们开始吧。 - -### 步骤 0:项目结构 - -对于此项目,我使用了以下的项目结构。 - -``` -yosh_project -|-- yosh - |-- __init__.py - |-- shell.py -``` - -`yosh_project` 为项目根目录(你也可以把它简单地命名为 `yosh`)。 - -`yosh` 为包目录,并且 `__init__.py` 将会使一个包名等同于包目录名字(如果你不写 Python,可以忽略它) - -`shell.py` 是我们的主脚本文件。 - -### 步骤 1:Shell 循环 - -当你启动一个 shell,它会显示一个命令提示符同时等待用户输入命令。在接收了输入的命令并执行它之后(稍后文章会进行详细解释),你的 shell 会回到循环,等待下一条指令。 - -在 `shell.py`,我们会以一个简单的 mian 函数开始,该函数调用了 shell_loop() 函数,如下: - -``` -def shell_loop(): - # Start the loop here - - -def main(): - shell_loop() - - -if __name__ == "__main__": - main() -``` - -接着,在 `shell_loop()`,为了指示循环是否继续或停止,我们使用了一个状态标志。在循环的开始,我们的 shell 将显示一个命令提示符,并等待读取命令输入。 - -``` -import sys - -SHELL_STATUS_RUN = 1 -SHELL_STATUS_STOP = 0 - - -def shell_loop(): - status = SHELL_STATUS_RUN - - while status == SHELL_STATUS_RUN: - # Display a command prompt - sys.stdout.write('> ') - sys.stdout.flush() - - # Read command input - cmd = sys.stdin.readline() -``` - -之后,我们切分命令输入并进行执行(我们将马上解释命令切分和执行函数)。 - -因此,我们的 shell_loop() 会是如下这样: - -``` -import sys - -SHELL_STATUS_RUN = 1 -SHELL_STATUS_STOP = 0 - - -def shell_loop(): - status = SHELL_STATUS_RUN - - while status == SHELL_STATUS_RUN: - # Display a command prompt - sys.stdout.write('> ') - sys.stdout.flush() - - # Read command input - cmd = sys.stdin.readline() - - # Tokenize the command input - cmd_tokens = tokenize(cmd) - - # Execute the command and retrieve new status - status = execute(cmd_tokens) -``` - -这就是我们整个 shell 循环。如果我们使用 python shell.py 命令启动 shell,它会显示命令提示符。然而如果我们输入命令并按回车,它将会抛出错误,因为我们还没定义命令切分函数。 - -为了退出 shell,可以尝试输入 ctrl-c。稍后我将解释如何以优雅的形式退出 shell。 - -### 步骤 2:命令切分 - -当一个用户在我们的 shell 中输入命令并按下回车键,该命令将会是一个包含命令名称及其参数的很长的字符串。因此,我们必须切分该字符串(分割一个字符串为多个标记)。 - -咋一看似乎很简单。我们或许可以使用 cmd.split(),用空格分割输入。它对类似 `ls -a my_folder` 的命令起作用,因为它能够将命令分割为一个列表 `['ls', '-a', 'my_folder']`,这样我们便能轻易处理它们了。 - -然而,也有一些类似 `echo "Hello World"` 或 `echo 'Hello World'` 以单引号或双引号引用参数的情况。如果我们使用 cmd.spilt,我们将会得到一个存有 3 个标记的列表 `['echo', '"Hello', 'World"']` 而不是 2 个标记 `['echo', 'Hello World']`。 - -幸运的是,Python 提供了一个名为 shlex 的库,能够帮助我们效验如神地分割命令。(提示:我们也可以使用正则表达式,但它不是本文的重点。) - - -``` -import sys -import shlex - -... - -def tokenize(string): - return shlex.split(string) - -... -``` - -然后我们将这些标记发送到执行过程。 - -### 步骤 3:执行 - -这是 shell 中核心和有趣的一部分。当 shell 执行 mkdir test_dir 时,发生了什么?(提示:midir 是一个带有 test_dir 参数的执行程序,用于创建一个名为 test_dir 的目录。) - -execvp 是涉及这一步的首个函数。在我们解释 execvp 所做的事之前,让我们看看它的实际效果。 - -``` -import os -... - -def execute(cmd_tokens): - # Execute command - os.execvp(cmd_tokens[0], cmd_tokens) - - # Return status indicating to wait for next command in shell_loop - return SHELL_STATUS_RUN - -... -``` - -再次尝试运行我们的 shell,并输入 `mkdir test_dir` 命令,接着按下回车键。 - -在我们敲下回车键之后,问题是我们的 shell 会直接退出而不是等待下一个命令。然而,目标正确地被创建。 - -因此,execvp 实际上做了什么? - -execvp 是系统调用 exec 的一个变体。第一个参数是程序名字。v 表示第二个参数是一个程序参数列表(可变参数)。p 表示环境变量 PATH 会被用于搜索给定的程序名字。在我们上一次的尝试中,可以在你的 PATH 环境变量查找到 mkdir 程序。 - -(还有其他 exec 变体,比如 execv、execvpe、execl、execlp、execlpe;你可以 google 它们获取更多的信息。) - -exec 会用即将运行的新进程替换调用进程的当前内存。在我们的例子中,我们的 shell 进程内存会被替换为 `mkdir` 程序。接着,mkdir 成为主进程并创建 test_dir 目录。最后该进程退出。 - -这里的重点在于我们的 shell 进程已经被 mkdir 进程所替换。这就是我们的 shell 消失且不会等待下一条命令的原因。 - -因此,我们需要其他的系统调用来解决问题:fork - -fork 会开辟新的内存并拷贝当前进程到一个新的进程。我们称这个新的进程为子进程,调用者进程为父进程。然后,子进程内存会被替换为被执行的程序。因此,我们的 shell,也就是父进程,可以免受内存替换的危险。 - -让我们看看已修改的代码。 - -``` -... - -def execute(cmd_tokens): - # Fork a child shell process - # If the current process is a child process, its `pid` is set to `0` - # else the current process is a parent process and the value of `pid` - # is the process id of its child process. - pid = os.fork() - - if pid == 0: - # Child process - # Replace the child shell process with the program called with exec - os.execvp(cmd_tokens[0], cmd_tokens) - elif pid > 0: - # Parent process - while True: - # Wait response status from its child process (identified with pid) - wpid, status = os.waitpid(pid, 0) - - # Finish waiting if its child process exits normally - # or is terminated by a signal - if os.WIFEXITED(status) or os.WIFSIGNALED(status): - break - - # Return status indicating to wait for next command in shell_loop - return SHELL_STATUS_RUN - -... -``` - -当我们的父进程调用 `os.fork()`时,你可以想象所有的源代码被拷贝到了新的子进程。此时此刻,父进程和子进程看到的是相同的代码,并且并行运行着。 - -如果运行的代码属于子进程,pid 将为 0。否则,如果运行的代码属于父进程,pid 将会是子进程的进程 id。 - -当 os.execvp 在子进程中被调用时,你可以想象子进程的所有源代码被替换为正被调用程序的代码。然而父进程的代码不会被改变。 - -当父进程完成等待子进程退出或终止时,它会返回一个状态,指示继续 shell 循环。 - -### 运行 - -现在,你可以尝试运行我们的 shell 并输入 mkdir test_dir2。它应该可以正确执行。我们的主 shell 进程仍然存在并等待下一条命令。尝试执行 ls,你可以看到已创建的目录。 - -但是,这里仍有许多问题。 - -第一,尝试执行 cd test_dir2,接着执行 ls。它应该会进入到一个空的 test_dir2 目录。然而,你将会看到目录没有变为 test_dir2。 - -第二,我们仍然没有办法优雅地退出我们的 shell。 - -我们将会在 [Part 2][1] 解决诸如此类的问题。 - - --------------------------------------------------------------------------------- - -via: https://hackercollider.com/articles/2016/07/05/create-your-own-shell-in-python-part-1/ - -作者:[Supasate Choochaisri][a] -译者:[译者ID](https://github.com/译者ID) -校对:[校对者ID](https://github.com/校对者ID) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]: https://disqus.com/by/supasate_choochaisri/ -[1]: https://hackercollider.com/articles/2016/07/06/create-your-own-shell-in-python-part-2/ diff --git a/translated/tech/20160705 Create Your Own Shell in Python - Part I.md b/translated/tech/20160705 Create Your Own Shell in Python - Part I.md new file mode 100644 index 0000000000..b54d0bff29 --- /dev/null +++ b/translated/tech/20160705 Create Your Own Shell in Python - Part I.md @@ -0,0 +1,228 @@ +使用 Python 创建你自己的 Shell:Part I +========================================== + +我很想知道一个 shell (像 bash,csh 等)内部是如何工作的。为了满足自己的好奇心,我使用 Python 实现了一个名为 **yosh** (Your Own Shell)的 Shell。本文章所介绍的概念也可以应用于其他编程语言。 + +(提示:你可以在[这里](https://github.com/supasate/yosh)查找本博文使用的源代码,代码以 MIT 许可证发布。在 Mac OS X 10.11.5 上,我使用 Python 2.7.10 和 3.4.3 进行了测试。它应该可以运行在其他类 Unix 环境,比如 Linux 和 Windows 上的 Cygwin。) + +让我们开始吧。 + +### 步骤 0:项目结构 + +对于此项目,我使用了以下的项目结构。 + +``` +yosh_project +|-- yosh + |-- __init__.py + |-- shell.py +``` + +`yosh_project` 为项目根目录(你也可以把它简单命名为 `yosh`)。 + +`yosh` 为包目录,且 `__init__.py` 可以使它成为与包目录名字相同的包(如果你不写 Python,可以忽略它。) + +`shell.py` 是我们主要的脚本文件。 + +### 步骤 1:Shell 循环 + +当启动一个 shell,它会显示一个命令提示符并等待你的命令输入。在接收了输入的命令并执行它之后(稍后文章会进行详细解释),你的 shell 会重新回到循环,等待下一条指令。 + +在 `shell.py`,我们会以一个简单的 mian 函数开始,该函数调用了 shell_loop() 函数,如下: + +``` +def shell_loop(): + # Start the loop here + + +def main(): + shell_loop() + + +if __name__ == "__main__": + main() +``` + +接着,在 `shell_loop()`,为了指示循环是否继续或停止,我们使用了一个状态标志。在循环的开始,我们的 shell 将显示一个命令提示符,并等待读取命令输入。 + +``` +import sys + +SHELL_STATUS_RUN = 1 +SHELL_STATUS_STOP = 0 + + +def shell_loop(): + status = SHELL_STATUS_RUN + + while status == SHELL_STATUS_RUN: + # Display a command prompt + sys.stdout.write('> ') + sys.stdout.flush() + + # Read command input + cmd = sys.stdin.readline() +``` + +之后,我们切分命令输入并进行执行(我们即将实现`命令切分`和`执行`函数)。 + +因此,我们的 shell_loop() 会是如下这样: + +``` +import sys + +SHELL_STATUS_RUN = 1 +SHELL_STATUS_STOP = 0 + + +def shell_loop(): + status = SHELL_STATUS_RUN + + while status == SHELL_STATUS_RUN: + # Display a command prompt + sys.stdout.write('> ') + sys.stdout.flush() + + # Read command input + cmd = sys.stdin.readline() + + # Tokenize the command input + cmd_tokens = tokenize(cmd) + + # Execute the command and retrieve new status + status = execute(cmd_tokens) +``` + +这就是我们整个 shell 循环。如果我们使用 `python shell.py` 启动我们的 shell,它会显示命令提示符。然而如果我们输入命令并按回车,它会抛出错误,因为我们还没定义`命令切分`函数。 + +为了退出 shell,可以尝试输入 ctrl-c。稍后我将解释如何以优雅的形式退出 shell。 + +### 步骤 2:命令切分 + +当用户在我们的 shell 中输入命令并按下回车键,该命令将会是一个包含命令名称及其参数的很长的字符串。因此,我们必须切分该字符串(分割一个字符串为多个标记)。 + +咋一看似乎很简单。我们或许可以使用 `cmd.split()`,以空格分割输入。它对类似 `ls -a my_folder` 的命令起作用,因为它能够将命令分割为一个列表 `['ls', '-a', 'my_folder']`,这样我们便能轻易处理它们了。 + +然而,也有一些类似 `echo "Hello World"` 或 `echo 'Hello World'` 以单引号或双引号引用参数的情况。如果我们使用 cmd.spilt,我们将会得到一个存有 3 个标记的列表 `['echo', '"Hello', 'World"']` 而不是 2 个标记的列表 `['echo', 'Hello World']`。 + +幸运的是,Python 提供了一个名为 `shlex` 的库,它能够帮助我们效验如神地分割命令。(提示:我们也可以使用正则表达式,但它不是本文的重点。) + + +``` +import sys +import shlex + +... + +def tokenize(string): + return shlex.split(string) + +... +``` + +然后我们将这些标记发送到执行进程。 + +### 步骤 3:执行 + +这是 shell 中核心和有趣的一部分。当 shell 执行 `mkdir test_dir` 时,到底发生了什么?(提示: `mkdir` 是一个带有 `test_dir` 参数的执行程序,用于创建一个名为 `test_dir` 的目录。) + +`execvp` 是涉及这一步的首个函数。在我们解释 `execvp` 所做的事之前,让我们看看它的实际效果。 + +``` +import os +... + +def execute(cmd_tokens): + # Execute command + os.execvp(cmd_tokens[0], cmd_tokens) + + # Return status indicating to wait for next command in shell_loop + return SHELL_STATUS_RUN + +... +``` + +再次尝试运行我们的 shell,并输入 `mkdir test_dir` 命令,接着按下回车键。 + +在我们敲下回车键之后,问题是我们的 shell 会直接退出而不是等待下一个命令。然而,目标正确地被创建。 + +因此,`execvp` 实际上做了什么? + +`execvp` 是系统调用 `exec` 的一个变体。第一个参数是程序名字。`v` 表示第二个参数是一个程序参数列表(可变参数)。`p` 表示环境变量 `PATH` 会被用于搜索给定的程序名字。在我们上一次的尝试中,它将会基于我们的 `PATH` 环境变量查找`mkdir` 程序。 + +(还有其他 `exec` 变体,比如 execv、execvpe、execl、execlp、execlpe;你可以 google 它们获取更多的信息。) + +`exec` 会用即将运行的新进程替换调用进程的当前内存。在我们的例子中,我们的 shell 进程内存会被替换为 `mkdir` 程序。接着,`mkdir` 成为主进程并创建 `test_dir` 目录。最后该进程退出。 + +这里的重点在于**我们的 shell 进程已经被 `mkdir` 进程所替换**。这就是我们的 shell 消失且不会等待下一条命令的原因。 + +因此,我们需要其他的系统调用来解决问题:`fork`。 + +`fork` 会开辟新的内存并拷贝当前进程到一个新的进程。我们称这个新的进程为**子进程**,调用者进程为**父进程**。然后,子进程内存会被替换为被执行的程序。因此,我们的 shell,也就是父进程,可以免受内存替换的危险。 + +让我们看看修改的代码。 + +``` +... + +def execute(cmd_tokens): + # Fork a child shell process + # If the current process is a child process, its `pid` is set to `0` + # else the current process is a parent process and the value of `pid` + # is the process id of its child process. + pid = os.fork() + + if pid == 0: + # Child process + # Replace the child shell process with the program called with exec + os.execvp(cmd_tokens[0], cmd_tokens) + elif pid > 0: + # Parent process + while True: + # Wait response status from its child process (identified with pid) + wpid, status = os.waitpid(pid, 0) + + # Finish waiting if its child process exits normally + # or is terminated by a signal + if os.WIFEXITED(status) or os.WIFSIGNALED(status): + break + + # Return status indicating to wait for next command in shell_loop + return SHELL_STATUS_RUN + +... +``` + +当我们的父进程调用 `os.fork()`时,你可以想象所有的源代码被拷贝到了新的子进程。此时此刻,父进程和子进程看到的是相同的代码,且并行运行着。 + +如果运行的代码属于子进程,`pid` 将为 `0`。否则,如果运行的代码属于父进程,`pid` 将会是子进程的进程 id。 + +当 `os.execvp` 在子进程中被调用时,你可以想象子进程的所有源代码被替换为正被调用程序的代码。然而父进程的代码不会被改变。 + +当父进程完成等待子进程退出或终止时,它会返回一个状态,指示继续 shell 循环。 + +### 运行 + +现在,你可以尝试运行我们的 shell 并输入 `mkdir test_dir2`。它应该可以正确执行。我们的主 shell 进程仍然存在并等待下一条命令。尝试执行 `ls`,你可以看到已创建的目录。 + +但是,这里仍有许多问题。 + +第一,尝试执行 `cd test_dir2`,接着执行 `ls`。它应该会进入到一个空的 `test_dir2` 目录。然而,你将会看到目录并没有变为 `test_dir2`。 + +第二,我们仍然没有办法优雅地退出我们的 shell。 + +我们将会在 [Part 2][1] 解决诸如此类的问题。 + + +-------------------------------------------------------------------------------- + +via: https://hackercollider.com/articles/2016/07/05/create-your-own-shell-in-python-part-1/ + +作者:[Supasate Choochaisri][a] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://disqus.com/by/supasate_choochaisri/ +[1]: https://hackercollider.com/articles/2016/07/06/create-your-own-shell-in-python-part-2/ From 1477657f553de2d5556e165483d110908eebde5d Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Wed, 13 Jul 2016 22:56:30 +0800 Subject: [PATCH 07/17] Translating by cposture --- sources/tech/20160309 Let’s Build A Web Server. Part 1.md | 1 + sources/tech/20160406 Let’s Build A Web Server. Part 2.md | 1 + 2 files changed, 2 insertions(+) diff --git a/sources/tech/20160309 Let’s Build A Web Server. Part 1.md b/sources/tech/20160309 Let’s Build A Web Server. Part 1.md index 4c8048786d..47f8bfdcc7 100644 --- a/sources/tech/20160309 Let’s Build A Web Server. Part 1.md +++ b/sources/tech/20160309 Let’s Build A Web Server. Part 1.md @@ -1,3 +1,4 @@ +Translating by cposture 2016.07.13 Let’s Build A Web Server. Part 1. ===================================== diff --git a/sources/tech/20160406 Let’s Build A Web Server. Part 2.md b/sources/tech/20160406 Let’s Build A Web Server. Part 2.md index 482352ac9a..5cba11dd64 100644 --- a/sources/tech/20160406 Let’s Build A Web Server. Part 2.md +++ b/sources/tech/20160406 Let’s Build A Web Server. Part 2.md @@ -1,3 +1,4 @@ +Translating by cposture 2016.07.13 Let’s Build A Web Server. Part 2. =================================== From 2f19bee95fd0bb828463f0f331636d2809b4fc1e Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Fri, 15 Jul 2016 14:11:18 +0800 Subject: [PATCH 08/17] Translating partly 50 --- ...eate Your Own Shell in Python - Part II.md | 38 +++++++++---------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md index 3154839443..2733288475 100644 --- a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md +++ b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md @@ -1,26 +1,25 @@ -Translating by cposture 2016.07.09 -Create Your Own Shell in Python - Part II +使用 Python 创建你自己的 Shell:Part II =========================================== -In [part 1][1], we already created a main shell loop, tokenized command input, and executed a command by fork and exec. In this part, we will solve the remaining problmes. First, `cd test_dir2` does not change our current directory. Second, we still have no way to exit from our shell gracefully. +在[part 1][1],我们已经创建了一个主要的 shell 循环、切分了的命令输入,以及通过 fork 和 exec 执行命令。在这部分,我们将会解决剩下的问题。首先,`cd test_dir2` 命令无法修改我们的当前目录。其次,我们仍无法优雅地从 shell 中退出。 -### Step 4: Built-in Commands +### 步骤 4:内置命令 -The statement “cd test_dir2 does not change our current directory” is true and false in some senses. It’s true in the sense that after executing the command, we are still at the same directory. However, the directory is actullay changed, but, it’s changed in the child process. +“cd test_dir2 无法修改我们的当前目录” 这句话是对的,但在某种意义上也是错的。在执行完该命令之后,我们仍然处在同一目录,从这个意义上讲,它是对的。然而,目录实际上已经被修改,只不过它是在子进程中被修改。 -Remember that we fork a child process, then, exec the command which does not happen on a parent process. The result is we just change the current directory of a child process, not the directory of a parent process. +还记得我们 fork 了一个子进程,然后执行命令,执行命令的过程没有发生在父进程上。结果是我们只是改变了子进程的当前目录,而不是父进程的目录。 -Then, the child process exits, and the parent process continues with the same intact directory. +然后子进程退出,且父进程在原封不动的目录下继续运行。 -Therefore, this kind of commands must be built-in with the shell itself. It must be executed in the shell process without forking. +因此,这类与 shell 自己相关的命令必须是内置命令。它必须在 shell 进程中执行而没有分叉(forking)。 #### cd -Let’s start with cd command. +让我们从 cd 命令开始。 -We first create a builtins directory. Each built-in command will be put inside this directory. +我们首先创建一个内置目录。每一个内置命令都会被放进这个目录中。 -``` +```shell yosh_project |-- yosh |-- builtins @@ -30,9 +29,9 @@ yosh_project |-- shell.py ``` -In cd.py, we implement our own cd command by using a system call os.chdir. +在 cd.py,我们通过使用系统调用 os.chdir 实现自己的 cd 命令。 -``` +```python import os from yosh.constants import * @@ -43,9 +42,9 @@ def cd(args): return SHELL_STATUS_RUN ``` -Notice that we return shell running status from a built-in function. Therefore, we move constants into yosh/constants.py to be used across the project. +注意,我们会从内置函数返回 shell 的运行状态。所以,为了能够在项目中继续使用常量,我们将它们移至 yosh/constants.py。 -``` +```shell yosh_project |-- yosh |-- builtins @@ -56,16 +55,16 @@ yosh_project |-- shell.py ``` -In constants.py, we put shell status constants here. +在 constants.py,我们将状态常量放在这里。 -``` +```python SHELL_STATUS_STOP = 0 SHELL_STATUS_RUN = 1 ``` -Now, our built-in cd is ready. Let’s modify our shell.py to handle built-in functions. +现在,我们的内置 cd 已经准备好了。让我们修改 shell.py 来处理这些内置函数。 -``` +```python ... # Import constants from yosh.constants import * @@ -90,6 +89,7 @@ def execute(cmd_tokens): ... ``` +我们使用一个 python 字典变量 built_in_cmds 作为哈希映射(a hash map),以存储我们的内置函数。在 execute 函数,我们提取命令的名字和参数。如果该命令在我们的哈希映射中,则调用对应的内置函数。 We use a Python dictionary built_in_cmds as a hash map to store our built-in functions. In execute function, we extract command name and arguments. If the command name is in our hash map, we call that built-in function. (Note: built_in_cmds[cmd_name] returns the function reference that can be invoked with arguments immediately.) From 31a898b314bc71e8638a55df2e9f9822d9008d2f Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 16 Jul 2016 00:02:06 +0800 Subject: [PATCH 09/17] Translting partly 70 --- .../20160706 Create Your Own Shell in Python - Part II.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md index 2733288475..26f0625368 100644 --- a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md +++ b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md @@ -89,12 +89,11 @@ def execute(cmd_tokens): ... ``` -我们使用一个 python 字典变量 built_in_cmds 作为哈希映射(a hash map),以存储我们的内置函数。在 execute 函数,我们提取命令的名字和参数。如果该命令在我们的哈希映射中,则调用对应的内置函数。 -We use a Python dictionary built_in_cmds as a hash map to store our built-in functions. In execute function, we extract command name and arguments. If the command name is in our hash map, we call that built-in function. +我们使用一个 python 字典变量 built_in_cmds 作为哈希映射(a hash map),以存储我们的内置函数。我们在 execute 函数中提取命令的名字和参数。如果该命令在我们的哈希映射中,则调用对应的内置函数。 -(Note: built_in_cmds[cmd_name] returns the function reference that can be invoked with arguments immediately.) +(提示:built_in_cmds[cmd_name] 返回能直接使用参数调用的函数引用的。) -We are almost ready to use the built-in cd function. The last thing is to add cd function into the built_in_cmds map. +我们差不多准备好使用内置的 cd 函数了。最后一步是将 cd 函数添加到 built_in_cmds 映射中。 ``` ... @@ -119,6 +118,7 @@ def main(): shell_loop() ``` +我们定义 register_command 函数以添加一个内置函数到我们内置的命令哈希映射。 We define register_command function for adding a built-in function to our built-in commmand hash map. Then, we define init function and register the built-in cd function there. Notice the line register_command("cd", cd). The first argument is a command name. The second argument is a reference to a function. In order to let cd, in the second argument, refer to the cd function reference in yosh/builtins/cd.py, we have to put the following line in yosh/builtins/__init__.py. From 0eeb6216837084d151202801bc689acfc4f691c8 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 16 Jul 2016 01:04:01 +0800 Subject: [PATCH 10/17] Translating partly 75:wq --- ...20160706 Create Your Own Shell in Python - Part II.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md index 26f0625368..934011e0cf 100644 --- a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md +++ b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md @@ -118,16 +118,19 @@ def main(): shell_loop() ``` -我们定义 register_command 函数以添加一个内置函数到我们内置的命令哈希映射。 -We define register_command function for adding a built-in function to our built-in commmand hash map. Then, we define init function and register the built-in cd function there. +我们定义 register_command 函数以添加一个内置函数到我们内置的命令哈希映射。接着,我们定义 init 函数并且在这里注册内置的 cd 函数。 -Notice the line register_command("cd", cd). The first argument is a command name. The second argument is a reference to a function. In order to let cd, in the second argument, refer to the cd function reference in yosh/builtins/cd.py, we have to put the following line in yosh/builtins/__init__.py. +注意这行 register_command("cd", cd) 。第一个参数为命令的名字。第二个参数为一个函数引用。为了能够让第二个参数 cd 引用到 yosh/builtins/cd.py 中的cd 函数引用,我们必须将以下这行代码放在 yosh/builtins/__init__.py 文件中。 ``` from yosh.builtins.cd import * ``` + +因此,在 yosh/shell.py 中,当我们从 yosh.builtins 导入 * 时,我们可以得到已经通过 yosh.builtins +被导入的 cd 函数引用。 Therefore, in yosh/shell.py, when we import * from yosh.builtins, we get cd function reference that is already imported by yosh.builtins. +我们已经准备好了代码。 We’ve done preparing our code. Let’s try by running our shell as a module python -m yosh.shell at the same level as the yosh directory. Now, our cd command should change our shell directory correctly while non-built-in commands still work too. Cool. From 89d9ed97814565e48fc03badce1090df909a1b82 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 16 Jul 2016 11:03:10 +0800 Subject: [PATCH 11/17] Translated by cposture --- ...eate Your Own Shell in Python - Part II.md | 63 +++++++++---------- 1 file changed, 30 insertions(+), 33 deletions(-) diff --git a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md index 934011e0cf..0f0cd6a878 100644 --- a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md +++ b/sources/tech/20160706 Create Your Own Shell in Python - Part II.md @@ -1,7 +1,7 @@ 使用 Python 创建你自己的 Shell:Part II =========================================== -在[part 1][1],我们已经创建了一个主要的 shell 循环、切分了的命令输入,以及通过 fork 和 exec 执行命令。在这部分,我们将会解决剩下的问题。首先,`cd test_dir2` 命令无法修改我们的当前目录。其次,我们仍无法优雅地从 shell 中退出。 +在[part 1][1] 中,我们已经创建了一个主要的 shell 循环、切分了的命令输入,以及通过 `fork` 和 `exec` 执行命令。在这部分,我们将会解决剩下的问题。首先,`cd test_dir2` 命令无法修改我们的当前目录。其次,我们仍无法优雅地从 shell 中退出。 ### 步骤 4:内置命令 @@ -9,15 +9,15 @@ 还记得我们 fork 了一个子进程,然后执行命令,执行命令的过程没有发生在父进程上。结果是我们只是改变了子进程的当前目录,而不是父进程的目录。 -然后子进程退出,且父进程在原封不动的目录下继续运行。 +然后子进程退出,而父进程在原封不动的目录下继续运行。 因此,这类与 shell 自己相关的命令必须是内置命令。它必须在 shell 进程中执行而没有分叉(forking)。 #### cd -让我们从 cd 命令开始。 +让我们从 `cd` 命令开始。 -我们首先创建一个内置目录。每一个内置命令都会被放进这个目录中。 +我们首先创建一个 `builtins` 目录。每一个内置命令都会被放进这个目录中。 ```shell yosh_project @@ -29,7 +29,7 @@ yosh_project |-- shell.py ``` -在 cd.py,我们通过使用系统调用 os.chdir 实现自己的 cd 命令。 +在 `cd.py` 中,我们通过使用系统调用 `os.chdir` 实现自己的 `cd` 命令。 ```python import os @@ -42,7 +42,7 @@ def cd(args): return SHELL_STATUS_RUN ``` -注意,我们会从内置函数返回 shell 的运行状态。所以,为了能够在项目中继续使用常量,我们将它们移至 yosh/constants.py。 +注意,我们会从内置函数返回 shell 的运行状态。所以,为了能够在项目中继续使用常量,我们将它们移至 `yosh/constants.py`。 ```shell yosh_project @@ -55,14 +55,14 @@ yosh_project |-- shell.py ``` -在 constants.py,我们将状态常量放在这里。 +在 `constants.py` 中,我们将状态常量都放在这里。 ```python SHELL_STATUS_STOP = 0 SHELL_STATUS_RUN = 1 ``` -现在,我们的内置 cd 已经准备好了。让我们修改 shell.py 来处理这些内置函数。 +现在,我们的内置 `cd` 已经准备好了。让我们修改 `shell.py` 来处理这些内置函数。 ```python ... @@ -89,11 +89,11 @@ def execute(cmd_tokens): ... ``` -我们使用一个 python 字典变量 built_in_cmds 作为哈希映射(a hash map),以存储我们的内置函数。我们在 execute 函数中提取命令的名字和参数。如果该命令在我们的哈希映射中,则调用对应的内置函数。 +我们使用一个 python 字典变量 `built_in_cmds` 作为哈希映射(hash map),以存储我们的内置函数。我们在 `execute` 函数中提取命令的名字和参数。如果该命令在我们的哈希映射中,则调用对应的内置函数。 -(提示:built_in_cmds[cmd_name] 返回能直接使用参数调用的函数引用的。) +(提示:`built_in_cmds[cmd_name]` 返回能直接使用参数调用的函数引用的。) -我们差不多准备好使用内置的 cd 函数了。最后一步是将 cd 函数添加到 built_in_cmds 映射中。 +我们差不多准备好使用内置的 `cd` 函数了。最后一步是将 `cd` 函数添加到 `built_in_cmds` 映射中。 ``` ... @@ -118,32 +118,29 @@ def main(): shell_loop() ``` -我们定义 register_command 函数以添加一个内置函数到我们内置的命令哈希映射。接着,我们定义 init 函数并且在这里注册内置的 cd 函数。 +我们定义了 `register_command` 函数,以添加一个内置函数到我们内置的命令哈希映射。接着,我们定义 `init` 函数并且在这里注册内置的 `cd` 函数。 -注意这行 register_command("cd", cd) 。第一个参数为命令的名字。第二个参数为一个函数引用。为了能够让第二个参数 cd 引用到 yosh/builtins/cd.py 中的cd 函数引用,我们必须将以下这行代码放在 yosh/builtins/__init__.py 文件中。 +注意这行 `register_command("cd", cd)` 。第一个参数为命令的名字。第二个参数为一个函数引用。为了能够让第二个参数 `cd` 引用到 `yosh/builtins/cd.py` 中的 `cd` 函数引用,我们必须将以下这行代码放在 `yosh/builtins/__init__.py` 文件中。 ``` from yosh.builtins.cd import * ``` -因此,在 yosh/shell.py 中,当我们从 yosh.builtins 导入 * 时,我们可以得到已经通过 yosh.builtins -被导入的 cd 函数引用。 -Therefore, in yosh/shell.py, when we import * from yosh.builtins, we get cd function reference that is already imported by yosh.builtins. +因此,在 `yosh/shell.py` 中,当我们从 `yosh.builtins` 导入 `*` 时,我们可以得到已经通过 `yosh.builtins` 导入的 `cd` 函数引用。 -我们已经准备好了代码。 -We’ve done preparing our code. Let’s try by running our shell as a module python -m yosh.shell at the same level as the yosh directory. +我们已经准备好了代码。让我们尝试在 `yosh` 同级目录下以模块形式运行我们的 shell,`python -m yosh.shell`。 -Now, our cd command should change our shell directory correctly while non-built-in commands still work too. Cool. +现在,`cd` 命令可以正确修改我们的 shell 目录了,同时非内置命令仍然可以工作。非常好! #### exit -Here comes the last piece: to exit gracefully. +最后一块终于来了:优雅地退出。 -We need a function that changes the shell status to be SHELL_STATUS_STOP. So, the shell loop will naturally break and the shell program will end and exit. +我们需要一个可以修改 shell 状态为 `SHELL_STATUS_STOP` 的函数。这样,shell 循环可以自然地结束,shell 将到达终点而退出。 -As same as cd, if we fork and exec exit in a child process, the parent process will still remain inact. Therefore, the exit function is needed to be a shell built-in function. +和 `cd` 一样,如果我们在子进程中 fork 和执行 `exit` 函数,其对父进程是不起作用的。因此,`exit` 函数需要成为一个 shell 内置函数。 -Let’s start by creating a new file called exit.py in the builtins folder. +让我们从这开始:在 `builtins` 目录下创建一个名为 `exit.py` 的新文件。 ``` yosh_project @@ -157,7 +154,7 @@ yosh_project |-- shell.py ``` -The exit.py defines the exit function that just returns the status to break the main loop. +`exit.py` 定义了一个 `exit` 函数,该函数仅仅返回一个可以退出主循环的状态。 ``` from yosh.constants import * @@ -167,14 +164,14 @@ def exit(args): return SHELL_STATUS_STOP ``` -Then, we import the exit function reference in `yosh/builtins/__init__.py`. +然后,我们导入位于 `yosh/builtins/__init__.py` 文件的 `exit` 函数引用。 ``` from yosh.builtins.cd import * from yosh.builtins.exit import * ``` -Finally, in shell.py, we register the exit command in `init()` function. +最后,我们在 `shell.py` 中的 `init()` 函数注册 `exit` 命令。 ``` @@ -188,17 +185,17 @@ def init(): ... ``` -That’s all! +到此为止! -Try running python -m yosh.shell. Now you can enter exit to quit the program gracefully. +尝试执行 `python -m yosh.shell`。现在你可以输入 `exit` 优雅地退出程序了。 -### Final Thought +### 最后的想法 -I hope you enjoy creating yosh (your own shell) like I do. However, my version of yosh is still in an early stage. I don’t handle several corner cases that can corrupt the shell. There are a lot of built-in commands that I don’t cover. Some non-built-in commands can also be implemented as built-in commands to improve performance (avoid new process creation time). And, a ton of features are not yet implemented (see Common features and Differing features). +我希望你能像我一样享受创建 `yosh` (**y**our **o**wn **sh**ell)的过程。但我的 `yosh` 版本仍处于早期阶段。我没有处理一些会使 shell 崩溃的极端状况。还有很多我没有覆盖的内置命令。为了提高性能,一些非内置命令也可以实现为内置命令(避免新进程创建时间)。同时,大量的功能还没有实现(请看 [公共特性](http://tldp.org/LDP/Bash-Beginners-Guide/html/x7243.html) 和 [不同特性](http://www.tldp.org/LDP/intro-linux/html/x12249.html)) -I’ve provided the source code at github.com/supasate/yosh. Feel free to fork and play around. +我已经在 github.com/supasate/yosh 中提供了源代码。请随意 fork 和尝试。 -Now, it’s your turn to make it real Your Own SHell. +现在该是创建你真正自己拥有的 Shell 的时候了。 Happy Coding! @@ -207,7 +204,7 @@ Happy Coding! via: https://hackercollider.com/articles/2016/07/06/create-your-own-shell-in-python-part-2/ 作者:[Supasate Choochaisri][a] -译者:[译者ID](https://github.com/译者ID) +译者:[cposture](https://github.com/cposture) 校对:[校对者ID](https://github.com/校对者ID) 本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 From fc7e5a6dec6dea4bc018f29beaca8e944f4594c1 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 16 Jul 2016 11:04:29 +0800 Subject: [PATCH 12/17] Translated by cposture --- .../tech/20160706 Create Your Own Shell in Python - Part II.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {sources => translated}/tech/20160706 Create Your Own Shell in Python - Part II.md (100%) diff --git a/sources/tech/20160706 Create Your Own Shell in Python - Part II.md b/translated/tech/20160706 Create Your Own Shell in Python - Part II.md similarity index 100% rename from sources/tech/20160706 Create Your Own Shell in Python - Part II.md rename to translated/tech/20160706 Create Your Own Shell in Python - Part II.md From bf4f2549256f9facb25f9cb6d8fd8a2a1039dcfd Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sun, 17 Jul 2016 09:03:58 +0800 Subject: [PATCH 13/17] =?UTF-8?q?=E7=BD=91=E4=B8=8A=E5=B7=B2=E6=9C=89?= =?UTF-8?q?=E7=BF=BB=E8=AF=91?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- sources/tech/20160309 Let’s Build A Web Server. Part 1.md | 1 - sources/tech/20160406 Let’s Build A Web Server. Part 2.md | 1 - 2 files changed, 2 deletions(-) diff --git a/sources/tech/20160309 Let’s Build A Web Server. Part 1.md b/sources/tech/20160309 Let’s Build A Web Server. Part 1.md index 47f8bfdcc7..4c8048786d 100644 --- a/sources/tech/20160309 Let’s Build A Web Server. Part 1.md +++ b/sources/tech/20160309 Let’s Build A Web Server. Part 1.md @@ -1,4 +1,3 @@ -Translating by cposture 2016.07.13 Let’s Build A Web Server. Part 1. ===================================== diff --git a/sources/tech/20160406 Let’s Build A Web Server. Part 2.md b/sources/tech/20160406 Let’s Build A Web Server. Part 2.md index 5cba11dd64..482352ac9a 100644 --- a/sources/tech/20160406 Let’s Build A Web Server. Part 2.md +++ b/sources/tech/20160406 Let’s Build A Web Server. Part 2.md @@ -1,4 +1,3 @@ -Translating by cposture 2016.07.13 Let’s Build A Web Server. Part 2. =================================== From e0a9969aac1cd53e6b47fef5fa67f0f4ae8e92ac Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 30 Jul 2016 22:09:51 +0800 Subject: [PATCH 14/17] Translating 20%: --- ...18 An Introduction to Mocking in Python.md | 138 +++++++++--------- 1 file changed, 67 insertions(+), 71 deletions(-) diff --git a/sources/tech/20160618 An Introduction to Mocking in Python.md b/sources/tech/20160618 An Introduction to Mocking in Python.md index 182f596431..9c6c912b60 100644 --- a/sources/tech/20160618 An Introduction to Mocking in Python.md +++ b/sources/tech/20160618 An Introduction to Mocking in Python.md @@ -1,34 +1,36 @@ -Translating by cposture 2016-07-29 -An Introduction to Mocking in Python +Mock 在 Python 中的使用介绍 ===================================== +http://www.oschina.net/translate/an-introduction-to-mocking-in-python?cmp +本文讲述的是 Python 中 Mock 的使用 -This article is about mocking in python, +**如何在避免测试你的耐心的情景下执行单元测试** -**How to Run Unit Tests Without Testing Your Patience** +通常,我们编写的软件会直接与我们称之为肮脏无比的服务交互。用外行人的话说:交互已设计好的服务对我们的应用程序很重要,但是这会带来我们不希望的副作用,也就是那些在我们自己测试的时候不希望的功能。例如:我们正在写一个社交 app,并且想要测试一下我们 "发布到 Facebook" 的新功能,但是不想每次运行测试集的时候真的发布到 Facebook。 -More often than not, the software we write directly interacts with what we would label as “dirty” services. In layman’s terms: services that are crucial to our application, but whose interactions have intended but undesired side-effects—that is, undesired in the context of an autonomous test run.For example: perhaps we’re writing a social app and want to test out our new ‘Post to Facebook feature’, but don’t want to actually post to Facebook every time we run our test suite. -The Python unittest library includes a subpackage named unittest.mock—or if you declare it as a dependency, simply mock—which provides extremely powerful and useful means by which to mock and stub out these undesired side-effects. +Python 的单元测试库包含了一个名为 unittest.mock 或者可以称之为依赖的子包,简言之为 mock——其提供了极其强大和有用的方法,通过它们可以模拟和打桩我们不希望的副作用。 + >Source | <http://www.toptal.com/python/an-introduction-to-mocking-in-python> -Note: mock is [newly included][1] in the standard library as of Python 3.3; prior distributions will have to use the Mock library downloadable via [PyPI][2]. +注意:mock [最近收录][1]到了 Python 3.3 的标准库中;先前发布的版本必须通过 [PyPI][2] 下载 Mock 库。 +### ### Fear System Calls -To give you another example, and one that we’ll run with for the rest of the article, consider system calls. It’s not difficult to see that these are prime candidates for mocking: whether you’re writing a script to eject a CD drive, a web server which removes antiquated cache files from /tmp, or a socket server which binds to a TCP port, these calls all feature undesired side-effects in the context of your unit-tests. +再举另一个例子,思考一个我们会在余文讨论的系统调用。不难发现,这些系统调用都是主要的模拟对象:无论你是正在写一个可以弹出 CD 驱动的脚本,还是一个用来删除 /tmp 下过期的缓存文件的 Web 服务,这些调用都是在你的单元测试上下文中不希望的副作用。 ->As a developer, you care more that your library successfully called the system function for ejecting a CD as opposed to experiencing your CD tray open every time a test is run. +> 作为一个开发者,你需要更关心你的库是否成功地调用了一个可以弹出 CD 的系统函数,而不是切身经历 CD 托盘每次在测试执行的时候都打开了。 -As a developer, you care more that your library successfully called the system function for ejecting a CD (with the correct arguments, etc.) as opposed to actually experiencing your CD tray open every time a test is run. (Or worse, multiple times, as multiple tests reference the eject code during a single unit-test run!) +作为一个开发者,你需要更关心你的库是否成功地调用了一个可以弹出 CD 的系统函数(使用了正确的参数等等),而不是切身经历 CD 托盘每次在测试执行的时候都打开了。(或者更糟糕的是,很多次,在一个单元测试运行期间多个测试都引用了弹出代码!) -Likewise, keeping your unit-tests efficient and performant means keeping as much “slow code” out of the automated test runs, namely filesystem and network access. +同样,保持你的单元测试的效率和性能意味着需要让如此多的 "缓慢代码" 远离自动测试,比如文件系统和网络访问。 -For our first example, we’ll refactor a standard Python test case from original form to one using mock. We’ll demonstrate how writing a test case with mocks will make our tests smarter, faster, and able to reveal more about how the software works. +对于我们首个例子,我们要从原始形式到使用 mock 地重构一个标准 Python 测试用例。我们会演示如何使用 mock 写一个测试用例使我们的测试更加智能、快速,并且能展示更多关于我们软件的工作原理。 -### A Simple Delete Function +### 一个简单的删除函数 -We all need to delete files from our filesystem from time to time, so let’s write a function in Python which will make it a bit easier for our scripts to do so. +有时,我们都需要从文件系统中删除文件,因此,让我们在 Python 中写一个可以使我们的脚本更加轻易完成此功能的函数。 ``` #!/usr/bin/env python @@ -40,9 +42,9 @@ def rm(filename): os.remove(filename) ``` -Obviously, our rm method at this point in time doesn’t provide much more than the underlying os.remove method, but our codebase will improve, allowing us to add more functionality here. +很明显,我们的 rm 方法此时无法提供比相关 os.remove 方法更多的功能,但我们的基础代码会逐步改善,允许我们在这里添加更多的功能。 -Let’s write a traditional test case, i.e., without mocks: +让我们写一个传统的测试用例,即,没有使用 mock: ``` #!/usr/bin/env python @@ -61,7 +63,7 @@ class RmTestCase(unittest.TestCase): def setUp(self): with open(self.tmpfilepath, "wb") as f: f.write("Delete me!") - + def test_rm(self): # remove the file rm(self.tmpfilepath) @@ -69,9 +71,11 @@ class RmTestCase(unittest.TestCase): self.assertFalse(os.path.isfile(self.tmpfilepath), "Failed to remove the file.") ``` -Our test case is pretty simple, but every time it is run, a temporary file is created and then deleted. Additionally, we have no way of testing whether our rm method properly passes the argument down to the os.remove call. We can assume that it does based on the test above, but much is left to be desired. +我们的测试用例相当简单,但是当它每次运行的时候,它都会创建一个临时文件并且随后删除。此外,我们没有办法测试我们的 rm 方法是否正确地将我们的参数向下传递给 os.remove 调用。我们可以基于以上的测试认为它做到了,但还有很多需要改进的地方。 -Refactoring with MocksLet’s refactor our test case using mock: +### 使用 Mock 重构 + +让我们使用 mock 重构我们的测试用例: ``` #!/usr/bin/env python @@ -83,7 +87,7 @@ import mock import unittest class RmTestCase(unittest.TestCase): - + @mock.patch('mymodule.os') def test_rm(self, mock_os): rm("any path") @@ -91,10 +95,11 @@ class RmTestCase(unittest.TestCase): mock_os.remove.assert_called_with("any path") ``` -With these refactors, we have fundamentally changed the way that the test operates. Now, we have an insider, an object we can use to verify the functionality of another. +使用这些重构,我们从根本上改变了该测试用例的运行方式。现在,我们有一个可以用于验证其他功能的内部对象。 -### Potential Pitfalls +### 潜在陷阱 +第一件需要注意的事情就是,我们使用了用于模拟 mock.patch 方法的装饰器位于mymodule.os One of the first things that should stick out is that we’re using the mock.patch method decorator to mock an object located at mymodule.os, and injecting that mock into our test case method. Wouldn’t it make more sense to just mock os itself, rather than the reference to it at mymodule.os? Well, Python is somewhat of a sneaky snake when it comes to imports and managing modules. At runtime, the mymodule module has its own os which is imported into its own local scope in the module. Thus, if we mock os, we won’t see the effects of the mock in the mymodule module. @@ -135,23 +140,23 @@ import mock import unittest class RmTestCase(unittest.TestCase): - + @mock.patch('mymodule.os.path') @mock.patch('mymodule.os') def test_rm(self, mock_os, mock_path): # set up the mock mock_path.isfile.return_value = False - + rm("any path") - + # test that the remove call was NOT called. self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.") - + # make the file 'exist' mock_path.isfile.return_value = True - + rm("any path") - + mock_os.remove.assert_called_with("any path") ``` @@ -190,26 +195,26 @@ import mock import unittest class RemovalServiceTestCase(unittest.TestCase): - + @mock.patch('mymodule.os.path') @mock.patch('mymodule.os') def test_rm(self, mock_os, mock_path): # instantiate our service reference = RemovalService() - + # set up the mock mock_path.isfile.return_value = False - + reference.rm("any path") - + # test that the remove call was NOT called. self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.") - + # make the file 'exist' mock_path.isfile.return_value = True - + reference.rm("any path") - + mock_os.remove.assert_called_with("any path") ``` @@ -228,13 +233,13 @@ class RemovalService(object): def rm(self, filename): if os.path.isfile(filename): os.remove(filename) - + class UploadService(object): def __init__(self, removal_service): self.removal_service = removal_service - + def upload_complete(self, filename): self.removal_service.rm(filename) ``` @@ -262,29 +267,29 @@ import mock import unittest class RemovalServiceTestCase(unittest.TestCase): - + @mock.patch('mymodule.os.path') @mock.patch('mymodule.os') def test_rm(self, mock_os, mock_path): # instantiate our service reference = RemovalService() - + # set up the mock mock_path.isfile.return_value = False - + reference.rm("any path") - + # test that the remove call was NOT called. self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.") - + # make the file 'exist' mock_path.isfile.return_value = True - + reference.rm("any path") - + mock_os.remove.assert_called_with("any path") - - + + class UploadServiceTestCase(unittest.TestCase): @mock.patch.object(RemovalService, 'rm') @@ -292,13 +297,13 @@ class UploadServiceTestCase(unittest.TestCase): # build our dependencies removal_service = RemovalService() reference = UploadService(removal_service) - + # call upload_complete, which should, in turn, call `rm`: reference.upload_complete("my uploaded file") - + # check that it called the rm method of any RemovalService mock_rm.assert_called_with("my uploaded file") - + # check that it called the rm method of _our_ removal_service removal_service.rm.assert_called_with("my uploaded file") ``` @@ -339,39 +344,39 @@ import mock import unittest class RemovalServiceTestCase(unittest.TestCase): - + @mock.patch('mymodule.os.path') @mock.patch('mymodule.os') def test_rm(self, mock_os, mock_path): # instantiate our service reference = RemovalService() - + # set up the mock mock_path.isfile.return_value = False - + reference.rm("any path") - + # test that the remove call was NOT called. self.assertFalse(mock_os.remove.called, "Failed to not remove the file if not present.") - + # make the file 'exist' mock_path.isfile.return_value = True - + reference.rm("any path") - + mock_os.remove.assert_called_with("any path") - - + + class UploadServiceTestCase(unittest.TestCase): def test_upload_complete(self, mock_rm): # build our dependencies mock_removal_service = mock.create_autospec(RemovalService) reference = UploadService(mock_removal_service) - + # call upload_complete, which should, in turn, call `rm`: reference.upload_complete("my uploaded file") - + # test that it called the rm method mock_removal_service.rm.assert_called_with("my uploaded file") ``` @@ -427,7 +432,7 @@ To finish up, let’s write a more applicable real-world example, one which we m import facebook class SimpleFacebook(object): - + def __init__(self, oauth_token): self.graph = facebook.GraphAPI(oauth_token) @@ -445,7 +450,7 @@ import mock import unittest class SimpleFacebookTestCase(unittest.TestCase): - + @mock.patch.object(facebook.GraphAPI, 'put_object', autospec=True) def test_post_message(self, mock_put_object): sf = simple_facebook.SimpleFacebook("fake oauth token") @@ -480,12 +485,3 @@ via: http://slviki.com/index.php/2016/06/18/introduction-to-mocking-in-python/ [6]: http://www.voidspace.org.uk/python/mock/mock.html [7]: http://www.toptal.com/qa/how-to-write-testable-code-and-why-it-matters [8]: http://www.toptal.com/python - - - - - - - - - From 9f91c8b545c4e0d0f3d9080e8ecbcb1fe4b35167 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sat, 30 Jul 2016 23:38:30 +0800 Subject: [PATCH 15/17] Translating 21% --- .../tech/20160618 An Introduction to Mocking in Python.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sources/tech/20160618 An Introduction to Mocking in Python.md b/sources/tech/20160618 An Introduction to Mocking in Python.md index 9c6c912b60..64c26b8ec5 100644 --- a/sources/tech/20160618 An Introduction to Mocking in Python.md +++ b/sources/tech/20160618 An Introduction to Mocking in Python.md @@ -99,15 +99,19 @@ class RmTestCase(unittest.TestCase): ### 潜在陷阱 -第一件需要注意的事情就是,我们使用了用于模拟 mock.patch 方法的装饰器位于mymodule.os +第一件需要注意的事情就是,我们使用了位于 mymodule.os 且用于模拟对象的 mock.patch 方法装饰器,并且将该 mock 注入到我们的测试用例方法。相比在 mymodule.os 引用它,那么只是模拟 os 本身,会不会更有意义? One of the first things that should stick out is that we’re using the mock.patch method decorator to mock an object located at mymodule.os, and injecting that mock into our test case method. Wouldn’t it make more sense to just mock os itself, rather than the reference to it at mymodule.os? +当然,当涉及到导入和管理模块,Python 的用法非常灵活。在运行时,mymodule 模块拥有被导入到本模块局部作用域的 os。因此,如果我们模拟 os,我们是看不到模拟在 mymodule 模块中的作用的。 Well, Python is somewhat of a sneaky snake when it comes to imports and managing modules. At runtime, the mymodule module has its own os which is imported into its own local scope in the module. Thus, if we mock os, we won’t see the effects of the mock in the mymodule module. +这句话需要深刻地记住: The mantra to keep repeating is this: +> 模拟测试一个项目,只需要了解它用在哪里,而不是它从哪里来。 > Mock an item where it is used, not where it came from. + If you need to mock the tempfile module for myproject.app.MyElaborateClass, you probably need to apply the mock to myproject.app.tempfile, as each module keeps its own imports. With that pitfall out of the way, let’s keep mocking. From 6d8fcc7fc1ff9f72e367c960e1b4e2105bf00fb2 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Sun, 31 Jul 2016 13:18:35 +0800 Subject: [PATCH 16/17] Translating 50% --- ...18 An Introduction to Mocking in Python.md | 24 ++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/sources/tech/20160618 An Introduction to Mocking in Python.md b/sources/tech/20160618 An Introduction to Mocking in Python.md index 64c26b8ec5..d28fae272d 100644 --- a/sources/tech/20160618 An Introduction to Mocking in Python.md +++ b/sources/tech/20160618 An Introduction to Mocking in Python.md @@ -99,26 +99,26 @@ class RmTestCase(unittest.TestCase): ### 潜在陷阱 -第一件需要注意的事情就是,我们使用了位于 mymodule.os 且用于模拟对象的 mock.patch 方法装饰器,并且将该 mock 注入到我们的测试用例方法。相比在 mymodule.os 引用它,那么只是模拟 os 本身,会不会更有意义? +第一件需要注意的事情就是,我们使用了位于 mymodule.os 且用于模拟对象的 mock.patch 方法装饰器,并且将该 mock 注入到我们的测试用例方法。相比在 mymodule.os 引用它,那么只是模拟 os 本身,会不会更有意义呢? One of the first things that should stick out is that we’re using the mock.patch method decorator to mock an object located at mymodule.os, and injecting that mock into our test case method. Wouldn’t it make more sense to just mock os itself, rather than the reference to it at mymodule.os? 当然,当涉及到导入和管理模块,Python 的用法非常灵活。在运行时,mymodule 模块拥有被导入到本模块局部作用域的 os。因此,如果我们模拟 os,我们是看不到模拟在 mymodule 模块中的作用的。 -Well, Python is somewhat of a sneaky snake when it comes to imports and managing modules. At runtime, the mymodule module has its own os which is imported into its own local scope in the module. Thus, if we mock os, we won’t see the effects of the mock in the mymodule module. 这句话需要深刻地记住: -The mantra to keep repeating is this: > 模拟测试一个项目,只需要了解它用在哪里,而不是它从哪里来。 > Mock an item where it is used, not where it came from. - +如果你需要为 myproject.app.MyElaborateClass 模拟 tempfile 模块,你可能需要 If you need to mock the tempfile module for myproject.app.MyElaborateClass, you probably need to apply the mock to myproject.app.tempfile, as each module keeps its own imports. +先将那个陷阱置身事外,让我们继续模拟。 With that pitfall out of the way, let’s keep mocking. -### Adding Validation to ‘rm’ +### 向 ‘rm’ 中加入验证 + +之前定义的 rm 方法相当的简单。在盲目地删除之前,我们倾向于拿它来验证一个路径是否存在,并验证其是否是一个文件。让我们重构 rm 使其变得更加智能: -The rm method defined earlier is quite oversimplified. We’d like to have it validate that a path exists and is a file before just blindly attempting to remove it. Let’s refactor rm to be a bit smarter: ``` #!/usr/bin/env python @@ -132,7 +132,7 @@ def rm(filename): os.remove(filename) ``` -Great. Now, let’s adjust our test case to keep coverage up. +很好。现在,让我们调整测试用例来保持测试的覆盖程度。 ``` #!/usr/bin/env python @@ -164,13 +164,13 @@ class RmTestCase(unittest.TestCase): mock_os.remove.assert_called_with("any path") ``` -Our testing paradigm has completely changed. We now can verify and validate internal functionality of methods without any side-effects. +我们的测试用例完全改变了。现在我们可以在没有任何副作用下核实并验证方法的内部功能。 -### File-Removal as a Service +### 将文件删除作为服务 -So far, we’ve only been working with supplying mocks for functions, but not for methods on objects or cases where mocking is necessary for sending parameters. Let’s cover object methods first. +到目前为止,我们只是对函数功能提供模拟测试,并没对需要传递参数的对象和实例的方法进行模拟测试。接下来我们将介绍如何对对象的方法进行模拟测试。 -We’ll begin with a refactor of the rm method into a service class. There really isn’t a justifiable need, per se, to encapsulate such a simple function into an object, but it will at the very least help us demonstrate key concepts in mock. Let’s refactor: +首先,我们将rm方法重构成一个服务类。实际上将这样一个简单的函数转换成一个对象,在本质上,这不是一个合理的需求,但它能够帮助我们了解mock的关键概念。让我们开始重构: ``` #!/usr/bin/env python @@ -187,6 +187,7 @@ class RemovalService(object): os.remove(filename) ``` +### 你会注意到我们的测试用例没有太大的变化 ### You’ll notice that not much has changed in our test case: ``` @@ -222,6 +223,7 @@ class RemovalServiceTestCase(unittest.TestCase): mock_os.remove.assert_called_with("any path") ``` +很好,我们知道 RemovalService 会如期工作。接下来让我们创建另一个服务,将其声明为一个依赖 Great, so we now know that the RemovalService works as planned. Let’s create another service which declares it as a dependency: ``` From a49a41f4118638a4b0e60ca491691aa577c8adf5 Mon Sep 17 00:00:00 2001 From: cposture <cposture@126.com> Date: Tue, 2 Aug 2016 13:35:33 +0800 Subject: [PATCH 17/17] Translating by cposture --- ...lding a data science portfolio - Machine learning project.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/team_test/part 6 - Building a data science portfolio - Machine learning project.md b/sources/team_test/part 6 - Building a data science portfolio - Machine learning project.md index 86ecbe127d..3bec1d0a98 100644 --- a/sources/team_test/part 6 - Building a data science portfolio - Machine learning project.md +++ b/sources/team_test/part 6 - Building a data science portfolio - Machine learning project.md @@ -1,4 +1,4 @@ - +Translating by cposture 2016-08-02 ### Making predictions Now that we have the preliminaries out of the way, we’re ready to make predictions. We’ll create a new file called predict.py that will use the train.csv file we created in the last step. The below code will: