From d805ad0ab0f1a47da4a86192f4b3920d00ef2789 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=85=AD=E5=BC=80=E7=AE=B1?= <lkxed@outlook.com>
Date: Fri, 24 Mar 2023 19:45:50 +0800
Subject: [PATCH] =?UTF-8?q?[=E6=89=8B=E5=8A=A8=E9=80=89=E9=A2=98][tech]:?=
 =?UTF-8?q?=2020230118.0=20=E2=AD=90=EF=B8=8F=E2=AD=90=EF=B8=8F=E2=AD=90?=
 =?UTF-8?q?=EF=B8=8F=20Examples=20of=20problems=20with=20integers.md?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 ...�️⭐️⭐️ Examples of problems with integers.md | 472 ++++++++++++++++++
 1 file changed, 472 insertions(+)
 create mode 100644 sources/tech/20230118.0 ⭐️⭐️⭐️ Examples of problems with integers.md

diff --git a/sources/tech/20230118.0 ⭐️⭐️⭐️ Examples of problems with integers.md b/sources/tech/20230118.0 ⭐️⭐️⭐️ Examples of problems with integers.md
new file mode 100644
index 0000000000..6751db4114
--- /dev/null
+++ b/sources/tech/20230118.0 ⭐️⭐️⭐️ Examples of problems with integers.md	
@@ -0,0 +1,472 @@
+[#]: subject: "Examples of problems with integers"
+[#]: via: "https://jvns.ca/blog/2023/01/18/examples-of-problems-with-integers/"
+[#]: author: "Julia Evans https://jvns.ca/"
+[#]: collector: "lkxed"
+[#]: translator: " "
+[#]: reviewer: " "
+[#]: publisher: " "
+[#]: url: " "
+
+Examples of problems with integers
+======
+
+Hello! A few days back we talked about [problems with floating point numbers][1].
+
+This got me thinking – but what about integers? Of course integers have all
+kinds of problems too – anytime you represent a number in a small fixed amount of
+space (like 8/16/32/64 bits), you’re going to run into problems.
+
+So I [asked on Mastodon again][2] for examples of integer problems and got all kinds of great responses again. Here’s a table of contents.
+
+[example 1: the small database primary key][3][example 2: integer overflow/underflow][4][aside: how do computers represent negative integers?][5][example 3: decoding a binary format in Java][6][example 4: misinterpreting an IP address or string as an integer][7][example 5: security problems because of integer overflow][8][example 6: the case of the mystery byte order][9][example 7: modulo of negative numbers][10][example 8: compilers removing integer overflow checks][11][example 9: the && typo][12]
+
+Like last time, I’ve written some example programs to demonstrate these
+problems. I’ve tried to use a variety of languages in the examples (Go,
+Javascript, Java, and C) to show that these problems don’t just show up in
+super low level C programs – integers are everywhere!
+
+Also I’ve probably made some mistakes in here, I learned several things while writing this.
+
+#### example 1: the small database primary key
+
+One of the most classic (and most painful!) integer problems is:
+
+- You create a database table where the primary key is a 32-bit unsigned integer, thinking “4 billion rows should be enough for anyone!”
+- You are massively successful and eventually, your table gets close to 4 billion rows
+- oh no!
+- You need to do a database migration to switch your primary key to be a 64-bit integer instead
+
+If the primary key actually reaches its maximum value I’m not sure exactly what
+happens, I’d imagine you wouldn’t be able to create any new database rows and
+it would be a very bad day for your massively successful service.
+
+#### example 2: integer overflow/underflow
+
+Here’s a Go program:
+
+```
+package main
+
+import "fmt"
+
+func main() {
+	var x uint32 = 5
+	var length uint32 = 0
+	if x < length-1 {
+		fmt.Printf("%d is less than %d\n", x, length-1)
+	}
+}
+```
+
+This slightly mysteriously prints out:
+
+```
+5 is less than 4294967295
+```
+
+That true, but it’s not what you might have expected.
+
+#### what’s going on?
+
+`0 - 1` is equal to the 4 bytes `0xFFFFFFFF`.
+
+There are 2 ways to interpret those 4 bytes:
+
+- As a _signed_ integer (-1)
+- As an _unsigned_ integer (4294967295)
+
+Go here is treating `length - 1` as a **unsigned** integer, because we defined `x` and `length` as uint32s (the “u” is for “unsigned”). So it’s testing if 5 is less than 4294967295, which it is!
+
+#### what do we do about it?
+
+I’m not actually sure if there’s any way to automatically detect integer overflow errors in Go. (though it looks like there’s a [github issue from 2019 with some discussion][13])
+
+Some brief notes about other languages:
+
+- Lots of languages (Python, Java, Ruby) don’t have unsigned integers at all, so this specific problem doesn’t come up
+- In C, you can compile with `clang -fsanitize=unsigned-integer-overflow`. Then if your code has an overflow/underflow like this, the program will crash.
+- Similarly in Rust, if you compile your program in debug mode it’ll crash if there’s an integer overflow. But in release mode it won’t crash, it’ll just happily decide that 0 - 1 = 4294967295.
+
+The reason Rust doesn’t check for overflows if you compile your program in
+release mode (and the reason C and Go don’t check) is that – these checks are
+expensive! Integer arithmetic is a very big part of many computations, and
+making sure that every single addition isn’t overflowing makes it slower.
+
+#### aside: how do computers represent negative integers?
+
+I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or
+`4294967295`. You might be thinking – what??? Why would `0xFFFFFFFF` mean `-1`?
+
+So let’s talk about how computers represent negative integers for a second.
+
+I’m going to simplify and talk about 8-bit integers instead of 32-bit integers,
+because there are less of them and it works basically the same way.
+
+You can represent 256 different numbers with an 8-bit integer: 0 to 255
+
+```
+00000000 -> 0
+00000001 -> 1
+00000010 -> 2
+...
+11111111 -> 255
+```
+
+But what if you want to represent _negative_ integers? We still only have 8
+bits! So we need to reassign some of these and treat them as negative numbers
+instead.
+
+Here’s the way most modern computers do it:
+
+- Every number that’s 128 or more becomes a negative number instead
+- How to know _which_ negative number it is: take the positive integer you’d expect it to be, and then subtract 256
+
+So 255 becomes -1, 128 becomes -128, and 200 becomes -56.
+
+Here are some maps of bits to numbers:
+
+```
+00000000 -> 0
+00000001 -> 1
+00000010 -> 2
+01111111 -> 127
+10000000 -> -128 (previously 128)
+10000001 -> -127 (previously 129)
+10000010 -> -126 (previously 130)
+...
+11111111 -> -1 (previously 255)
+```
+
+This gives us 256 numbers, from -128 to 127.
+
+And `11111111` (or `0xFF`, or 255) is -1.
+
+For 32 bit integers, it’s the same story, except it’s “every number larger than 2^31 becomes negative” and “subtract 2^32”. And similarly for other integer sizes.
+
+That’s how we end up with `0xFFFFFFFF` meaning -1.
+
+#### there are multiple ways to represent negative integers
+
+The way we just talked about of representing negative integers (“it’s the equivalent positive integer, but you subtract 2^n”) is called
+**two’s complement**, and it’s the most common on modern computers. There are several other ways
+though, the [wikipedia article has a list][14].
+
+#### weird thing: the absolute value of -128 is negative
+
+This [Go program][15] has a pretty simple `abs()` function that computes the absolute value of an integer:
+
+```
+package main
+
+import (
+	"fmt"
+)
+
+func abs(x int8) int8 {
+	if x < 0 {
+		return -x
+	}
+	return x
+}
+
+func main() {
+	fmt.Println(abs(-127))
+	fmt.Println(abs(-128))
+}
+```
+
+This prints out:
+
+```
+127
+-128
+```
+
+This is because the signed 8-bit integers go from -128 to 127 – there **is** no +128!
+Some programs might crash when you try to do this (it’s an overflow), but Go
+doesn’t.
+
+Now that we’ve talked about signed integers a bunch, let’s dig into another example of how they can cause problems.
+
+#### example 3: decoding a binary format in Java
+
+Let’s say you’re parsing a binary format in Java, and you want to get the first
+4 bits of the byte `0x90`. The correct answer is 9.
+
+```
+public class Main {
+    public static void main(String[] args) {
+        byte b = (byte) 0x90;
+        System.out.println(b >> 4);
+    }
+}
+```
+
+This prints out “-7”. That’s not right!
+
+#### what’s going on?
+
+There are two things we need to know about Java to make sense of this:
+
+- Java doesn’t have unsigned integers.
+- Java can’t right shift bytes, it can only shift integers. So anytime you shift a byte, it has to be promoted into an integer.
+
+Let’s break down what those two facts mean for our little calculation `b >> 4`:
+
+- In bits, `0x90` is `10010000`. This starts with a 1, which means that it’s more than 128, which means it’s a negative number
+- Java sees the `>>` and decides to promote `0x90` to an integer, so that it can shift it
+- The way you convert a negative byte to an 32-bit integer is to add a bunch of `1`s at the beginning. So now our 32-bit integer is `0xFFFFFF90` (`F` being 15, or `1111`)
+- Now we right shift (`b >> 4`). By default, Java does a **signed shift**, which means that it adds 0s to the beginning if it’s positive, and 1s to the beginning if it’s negative. (`>>>` is an unsigned  shift in Java)
+- We end up with `0xFFFFFFF9` (having cut off the last 4 bits and added more 1s at the beginning)
+- As a signed integer, that’s -7!
+
+#### what can you do about it?
+
+I don’t the actual idiomatic way to do this in Java is, but the way I’d naively
+approach fixing this is to put in a bit mask before doing the right shift. So
+instead of:
+
+```
+b >> 4
+```
+
+we’d write
+
+```
+(b & 0xFF) >> 4
+```
+
+`b & 0xFF` seems redundant (`b` is already a byte!), but it’s actually not because `b` is being promoted to an integer.
+
+Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> 0x00000009`, which is the result we wanted: 9.
+
+And when we actually try it, it prints out “9”.
+
+Also, if we were using a language with unsigned integers, the natural way to
+deal with this would be to treat the value as an unsigned integer in the first
+place. But that’s not possible in Java.
+
+#### example 4: misinterpreting an IP address or string as an integer
+
+I don’t know if this is technically a “problem with integers” but it’s funny
+so I’ll mention it: [Rachel by the bay][16] has a bunch of great
+examples of things that are not integers being interpreted as integers. For
+example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
+
+She points out that you can actually ping any integer, and it’ll convert that integer into an IP address, for example:
+
+```
+$ ping 2130706433
+PING 2130706433 (127.0.0.1): 56 data bytes
+$ ping 132848123841239999988888888888234234234234234234
+PING 132848123841239999988888888888234234234234234234 (251.164.101.122): 56 data bytes
+```
+
+(I’m not actually sure how ping is parsing that second integer or why ping accepts these giant larger-than-2^64-integers as valid inputs, but it’s a fun weird thing)
+
+#### example 5: security problems because of integer overflow
+
+Another integer overflow example: here’s a [search for CVEs involving integer overflows][17].
+There are a lot! I’m not a security person, but here’s one random example: this [json parsing library bug][18]
+
+My understanding of that json parsing bug is roughly:
+
+- you load a JSON file that’s 3GB or something, or 3,000,000,000
+- due to an integer overflow, the code allocates close to 0 bytes of memory instead of ~3GB amount of memory
+- but the JSON file is still 3GB, so it gets copied into the tiny buffer with almost 0 bytes of memory
+- this overwrites all kinds of other memory that it’s not supposed to
+
+The CVE says “This vulnerability mostly impacts process availability”, which I
+think means “the program crashes”, but sometimes this kind of thing is much
+worse and can result in arbitrary code execution.
+
+My impression is that there are a large variety of different flavours of
+security vulnerabilities caused by integer overflows.
+
+#### example 6: the case of the mystery byte order
+
+One person said that they’re do scientific computing and sometimes they need to
+read files which contain data with an unknown byte order.
+
+Let’s invent a small example of this: say you’re reading a file which contains 4
+bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know
+represent a 4-byte integer. There are 2 ways to interpret that integer:
+
+- `0x00001281` (which translates to 4737). This order is called “big endian”
+- `0x81120000` (which translates to 2165440512). This order is called “little endian”.
+
+Which one is it? Well, maybe the file contains some metadata that specifies the
+endianness. Or maybe you happen to know what machine it was generated on and
+what byte order that machine uses. Or maybe you just read a bunch of values,
+try both orders, and figure out which makes more sense. Maybe 2165440512 is too
+big to make sense in the context of whatever your data is supposed to mean, or
+maybe `4737` is too small.
+
+A couple more notes on this:
+
+- this isn’t just a problem with integers, floating point numbers have byte
+order too
+- this also comes up when reading data from a network, but in that case the
+byte order isn’t a “mystery”, it’s just going to be big endian. But x86
+machines (and many others) are little endian, so you have to swap the byte
+order of all your numbers.
+
+#### example 7: modulo of negative numbers
+
+This is more of a design decision about how different programming languages design their math libraries, but it’s still a little weird and lots of people mentioned it.
+
+Let’s say you write `-13 % 3` in your program, or `13 % -3`. What’s the result?
+
+It turns out that different programming languages do it differently, for
+example in Python `-13 % 3 = 2` but in Javascript `-13 % 3 = -1`.
+
+There’s a table in [this blog post][19] that
+describes a bunch of different programming languages’ choices.
+
+#### example 8: compilers removing integer overflow checks
+
+We’ve been hearing a lot about integer overflow and why it’s bad. So let’s
+imagine you try to be safe and include some checks in your programs – after
+each addition, you make sure that the calculation didn’t overflow. Like this:
+
+```
+#include <stdio.h>
+
+#define INT_MAX 2147483647
+
+int check_overflow(int n) {
+    n = n + 100;
+    if (n + 100 < 0)
+        return -1;
+    return 0;
+}
+
+int main() {
+    int result = check_overflow(INT_MAX);
+    printf("%d\n", result);
+}
+```
+
+`check_overflow` here should return `-1` (failure), because `INT_MAX + 100` is more than the maximum integer size.
+
+```
+$ gcc  check_overflow.c  -o check_overflow && ./check_overflow
+-1 
+$ gcc -O3 check_overflow.c  -o check_overflow && ./check_overflow
+0
+```
+
+That’s weird – when we compile with `gcc`, we get the answer we expected, but
+with `gcc -O3`, we get a different answer. Why?
+
+#### what’s going on?
+
+My understanding (which might be wrong) is:
+
+- Signed integer overflow in C is **undefined behavior**. I think that’s
+because different C implementations might be using different representations
+of signed integers (maybe they’re using one’s complement instead of two’s
+complement or something)
+- “undefined behaviour” in C means “the compiler is free to do literally whatever it wants after that point”  (see this post [With undefined behaviour, anything is possible][20] by Raph Levine for a lot more)
+- Some compiler optimizations assume that undefined behaviour will never
+happen. They’re free to do this, because – if that undefined behaviour
+_did_ happen, then they’re allowed to do whatever they want, so “run the
+code that I optimized assuming that this would never happen” is fine.
+- So this `if (n + 100 < 0)` check is irrelevant – if that did
+happen, it would be undefined behaviour, so there’s no need to execute the
+contents of that if statement.
+
+So, that’s weird. I’m not going to write a “what can you do about it?” section here because I’m pretty out of my depth already.
+
+I certainly would not have expected that though.
+
+My impression is that “undefined behaviour” is really a C/C++ concept, and
+doesn’t exist in other languages in the same way except in the case of “your
+program called some C code in an incorrect way and that C code did something
+weird because of undefined behaviour”. Which of course happens all the time.
+
+#### example 9: the && typo
+
+This one was mentioned as a very upsetting bug. Let’s say you have two integers
+and you want to check that they’re both nonzero.
+
+In Javascript, you might write:
+
+```
+if a && b {
+    /* some code */
+}
+```
+
+But you could also make a typo and type:
+
+```
+if a & b {
+    /* some code */
+}
+```
+
+This is still perfectly valid code, but it means something completely different
+– it’s a bitwise and instead of a boolean and. Let’s go into a Javascript
+console and look at bitwise vs boolean and for `9` and `4`:
+
+```
+> 9 && 4
+4
+> 9 & 4
+0
+> 4 && 5
+5
+> 4 & 5
+4
+```
+
+It’s easy to imagine this turning into a REALLY annoying bug since it would be
+intermittent – often `x & y` does turn out to be truthy if `x && y` is truthy.
+
+#### what to do about it?
+
+For Javascript, ESLint has a [no-bitwise check][21] check), which
+requires you manually flag “no, I actually know what I’m doing, I want to do
+bitwise and” if you use a bitwise and in your code. I’m sure many other linters
+have a similar check.
+
+#### that’s all for now!
+
+There are definitely more problems with integers than this, but this got pretty
+long again and I’m tired of writing again so I’m going to stop :)
+
+--------------------------------------------------------------------------------
+
+via: https://jvns.ca/blog/2023/01/18/examples-of-problems-with-integers/
+
+作者：[Julia Evans][a]
+选题：[lkxed][b]
+译者：[译者ID](https://github.com/译者ID)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://jvns.ca/
+[b]: https://github.com/lkxed/
+[1]: https://jvns.ca/blog/2023/01/13/examples-of-floating-point-problems/
+[2]: https://social.jvns.ca/@b0rk/109700446576896509
+[3]: https://jvns.ca#example-1-the-small-database-primary-key
+[4]: https://jvns.ca#example-2-integer-overflow-underflow
+[5]: https://jvns.ca#aside-how-do-computers-represent-negative-integers
+[6]: https://jvns.ca#example-3-decoding-a-binary-format-in-java
+[7]: https://jvns.ca#example-4-misinterpreting-an-ip-address-or-string-as-an-integer
+[8]: https://jvns.ca#example-5-security-problems-because-of-integer-overflow
+[9]: https://jvns.ca#example-6-the-case-of-the-mystery-byte-order
+[10]: https://jvns.ca#example-7-modulo-of-negative-numbers
+[11]: https://jvns.ca#example-8-compilers-removing-integer-overflow-checks
+[12]: https://jvns.ca#example-9-the-typo
+[13]: https://github.com/golang/go/issues/30613
+[14]: https://en.wikipedia.org/wiki/Signed_number_representations
+[15]: https://go.dev/play/p/iSFxbFAe75M
+[16]: https://rachelbythebay.com/w/2020/11/26/magic/
+[17]: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=integer+overflow
+[18]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-24795
+[19]: https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
+[20]: https://raphlinus.github.io/programming/rust/2018/08/17/undefined-behavior.html
+[21]: https://eslint.org/docs/latest/rules/no-bitwise
\ No newline at end of file