translated

2025-01-25 23:11:02 +08:00 · 2015-02-01 21:54:17 +08:00 · 2015-02-01 21:54:17 +08:00 · 7ed53ce0e4
commit 7ed53ce0e4
parent 635f4e1788
2 changed files with 93 additions and 96 deletions
--- a/sources/talk/20150121
+++ b/sources/talk/20150121
@ -1,96 +0,0 @@
-KayGuoWhu translating
-If a 32-bit integer overflows, can we use a 40-bit structure instead of a 64-bit long one?
---------
-
-#Q:
-
-If, say, a 32-bit integer is overflowing, instead of upgrading int to long, can we make use of some 40-bit type if we need a range only within 240, so that we save 24 (64-40) bits for every integer?
-
-If so, how?
-
-I have to deal with billions and space is a bigger constraint.
-
-#A:
-
-##Yes, but...
-
-It is certainly possible, but it is usually nonsensical (for any program that doesn't use billions of these numbers):
-
-```C
-#include <stdint.h> // don't want to rely on something like long long
-struct bad_idea
-{
-    uint64_t var : 40;
-};
-```
-
-Here, var will indeed have a width of 40 bits at the expense of much less efficient code generated (it turns out that "much" is very much wrong -- the measured overhead is a mere 1-2%, see timings below), and usually to no avail. Unless you have need for another 24-bit value (or an 8 and 16 bit value) which you wish to pack into the same structure, alignment will forfeit anything that you may gain.
-
-In any case, unless you have billions of these, the effective difference in memory consumption will not be noticeable (but the extra code needed to manage the bit field will be noticeable!).
-
-###Note:
-
-The question has in the mean time been updated to reflect that indeed billions of numbers are needed, so this may be a viable thing to do, presumed that you take measures not to lose the gains due to structure alignment and padding, i.e. either by storing something else in the remaining 24 bits or by storing your 40-bit values in structures of 8 each or multiples thereof).
-Saving three bytes a billion times is worthwhile as it will require noticeably fewer memory pages and thus cause fewer cache and TLB misses, and above all page faults (a single page fault weighting tens of millions instructions).
-
-While the above snippet does not make use of the remaining 24 bits (it merely demonstrates the "use 40 bits" part), something akin to the following will be necessary to really make the approach useful in a sense of preserving memory -- presumed that you indeed have other "useful" data to put in the holes:
-
-```C
-struct using_gaps
-{
-    uint64_t var           : 40;
-    uint64_t useful_uint16 : 16;
-    uint64_t char_or_bool  : 8;  
-};
-```
-
-Structure size and alignment will be equal to a 64 bit integer, so nothing is wasted if you make e.g. an array of a billion such structures (even without using compiler-specific extensions). If you don't have use for an 8-bit value, you could also use an 48-bit and a 16-bit value (giving a bigger overflow margin).
-Alternatively you could, at the expense of usability, put 8 40-bit values into a structure (least common multiple of 40 and 64 being 320 = 8*40). Of course then your code which accesses elements in the array of structures will become much more complicated (though one could probably implement an operator[] that restores the linear array functionality and hides the structure complexity).
-
-Update:
-Wrote a quick test suite, just to see what overhead the bitfields (and operator overloading with bitfield refs) would have. Posted code (due to length) at gcc.godbolt.org, test output from my Win7-64 machine is:
-
-```TXT
-Running test for array size = 1048576
-what       alloc   seq(w)  seq(r)  rand(w)  rand(r)  free
-----------------------------------------------------------
-uint32_t    0      2       1       35       35       1
-uint64_t    0      3       3       35       35       1
-bad40_t     0      5       3       35       35       1
-packed40_t  0      7       4       48       49       1
-
-
-Running test for array size = 16777216
-what        alloc  seq(w)  seq(r)  rand(w)  rand(r)  free
-----------------------------------------------------------
-uint32_t    0      38      14      560      555      8
-uint64_t    0      81      22      565      554      17
-bad40_t     0      85      25      565      561      16
-packed40_t  0      151     75      765      774      16
-
-
-Running test for array size = 134217728
-what        alloc  seq(w)  seq(r)  rand(w)  rand(r)  free
-----------------------------------------------------------
-uint32_t    0      312     100     4480     4441     65
-uint64_t    0      648     172     4482     4490     130
-bad40_t     0      682     193     4573     4492     130
-packed40_t  0      1164    552     6181     6176     130
-```
-
-What one can see is that the extra overhead of bitfields is neglegible, but the operator overloading with bitfield reference as a convenience thing is rather drastic (about 3x increase) when accessing data linearly in a cache-friendly manner. On the other hand, on random access it barely even matters.
-
-These timings suggest that simply using 64-bit integers would be better since they are still faster overall than bitfields (despite touching more memory), but of course they do not take into account the cost of page faults with much bigger datasets. It might look very different once you run out of physical RAM (I didn't test that).
-
------
-
-via:[stackoverflow](http://stackoverflow.com/questions/27705409/if-a-32-bit-integer-overflows-can-we-use-a-40-bit-structure-instead-of-a-64-bit/27705562#27705562)
-
-作者：[Damon][a][Michael Kohne][b]
-译者：[译者ID](https://github.com/译者ID)
-校对：[校对者ID](https://github.com/校对者ID)
-
-本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译，[Linux中国](http://linux.cn/) 荣誉推出
-
-[a]:http://stackoverflow.com/users/572743/damon
-[b]:http://stackoverflow.com/users/5801/michael-kohne
--- a/translated/talk/20150121
+++ b/translated/talk/20150121
@ -0,0 +1,93 @@
+如果使用32位整型会溢出，那么是否可以使用一个40位结构体代替64位长整型？
+---------
+
+#问题：
+假如说，使用32位的整型会溢出，在不考虑使用长整型的情况下，如果我们只需要表示2的40次方范围内的数，是否可以利用某些40位长的数据类型来表示呢？这样的话，每个整型数就可以节省24位的空间。
+
+如果可以，该怎么做？
+
+需求是：我现在必须处理数以亿计的数字，所以在存储空间上受到了很大的限制。
+
+#回答：
+
+##可以是可以，但是……
+
+这种方法的确可行，但这么做通常没什么意义（因为几乎没有程序需要处理多达十亿的数字）：
+
+```C
+#include <stdint.h> // 不要考虑使用long long类型
+struct bad_idea
+{
+    uint64_t var : 40;
+};
+```
+
+在这里，变量var占据40位大小，但是以生成代码时拥有非常低的运行效率来换取的（事实证明“非常”二字言过其实了——测试中程序开销仅仅增加了1%到2%，正如下面的测试时间所示），而且这么做通常没什么用。除非你还需要保存一个24位的值（或者是8位、16位的值），这样你皆可以它们放到同一个结构中。不然的话，因为对齐内存地址产生的开销会抵消这么做带来的好处。
+
+在任何情况下，除非你是真的需要保存数以亿计的数字，否则这样做给内存消耗带来的好处是可以忽略不计的（但是为了处理这些位字段的额外代码量是不可忽略的！）。
+
+###说明：
+
+在此期间，这个问题已经被更新了，是为了说明实际上确实有需要处理数以亿计数字的情况。假设，采取某些措施来防止因为结构体对齐和填充抵消好处（比如在后24位中存储其它的内容，或者使用多个8位来存储40位），那么这么做就变得有意义了。
+如果有十亿个数，每个数都节省三个字节的空间，那么这么做就非常有用了。因为使用更小的空间存储要求更少的内存页，也就会产生更少的cache和TLB不命中和内存缺页（单个缺页会产生数以千万计的指令（译者注：直译是这样，但语义说不通！））。
+
+尽管上面提到的情况不足以充分利用到剩余的24位（它仅仅使用了40位部分），如果确实在剩余位中放入了有用的数据，那么使用类似下面的方法会使得这种思路就管理内存而言显得非常有用。
+
+```C
+struct using_gaps
+{
+    uint64_t var           : 40;
+    uint64_t useful_uint16 : 16;
+    uint64_t char_or_bool  : 8;  
+};
+```
+
+结构体大小和对齐长度等于64位整型的大小，所以只要使用得当就不会浪费空间，比如对一个保存10亿个数的数组使用这个结构（不考虑使用指定编译器的扩展）。如果你不会用到一个8位的值，那么你可以使用一个48位和16位的值（giving a bigger overflow margin）。
+或者以牺牲可用性为代价，把8个64位的值放入这样的结构体中（或者使用40和64的组合使得其和满足320）。当然，在这种情况下，通过代码去访问数组结构体中的元素会变得非常麻烦（尽管一种方法是实现一个operator[]在功能上还原线性数组，隐藏结构体的复杂性）。
+
+更新：
+
+我写了一个快速测试工具，只是为了获得位字段的开销（以及伴随位字段引用的重载操作）。由于长度限制将代码发布在gcc.godbolt.org上，在本人64位Win7上的测试结果如下：
+
+```TXT
+运行测试的数组大小为1048576
+what       alloc   seq(w)  seq(r)  rand(w)  rand(r)  free
+-----------------------------------------------------------
+uint32_t    0      2       1       35       35       1
+uint64_t    0      3       3       35       35       1
+bad40_t     0      5       3       35       35       1
+packed40_t  0      7       4       48       49       1
+
+运行测试的数组大小为16777216
+what        alloc  seq(w)  seq(r)  rand(w)  rand(r)  free
+-----------------------------------------------------------
+uint32_t    0      38      14      560      555      8
+uint64_t    0      81      22      565      554      17
+bad40_t     0      85      25      565      561      16
+packed40_t  0      151     75      765      774      16
+
+运行测试的数组大小为134177228
+what        alloc  seq(w)  seq(r)  rand(w)  rand(r)  free
+-----------------------------------------------------------
+uint32_t    0      312     100     4480     4441     65
+uint64_t    0      648     172     4482     4490     130
+bad40_t     0      682     193     4573     4492     130
+packed40_t  0      1164    552     6181     6176     130
+```
+
+我们看到，位字段的额外开销是微不足道的，但是当以友好的方式线性访问数据时伴随位字段引用的操作符重载产生的开销则相当显著（大概有3倍）。在另一方面，随机访问产生的开销则无足轻重。
+
+这些时间表明简单的使用64位整型会更好，因为它们在整体性能上要比位字段好（尽管占用更多的内存），但是显然它们并没有考虑随着数据集增大带来的缺页开销。一旦程序内存超过RAM大小，结果可能就不一样了（未亲自考证）。
+
+------
+
+via:[stackoverflow](http://stackoverflow.com/questions/27705409/if-a-32-bit-integer-overflows-can-we-use-a-40-bit-structure-instead-of-a-64-bit/27705562#27705562)
+
+作者：[Damon][a][Michael Kohne][b]
+译者：[KayGuoWhu](https://github.com/KayGuoWhu)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译，[Linux中国](http://linux.cn/) 荣誉推出
+
+[a]:http://stackoverflow.com/users/572743/damon
+[b]:http://stackoverflow.com/users/5801/michael-kohne