TranslateProject/sources/tech/20230118.0 ⭐️⭐️⭐️ Examples of problems with integers.md

472 lines
18 KiB
Markdown
Raw Normal View History

[#]: subject: "Examples of problems with integers"
[#]: via: "https://jvns.ca/blog/2023/01/18/examples-of-problems-with-integers/"
[#]: author: "Julia Evans https://jvns.ca/"
[#]: collector: "lkxed"
[#]: translator: " "
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
Examples of problems with integers
======
Hello! A few days back we talked about [problems with floating point numbers][1].
This got me thinking but what about integers? Of course integers have all
kinds of problems too anytime you represent a number in a small fixed amount of
space (like 8/16/32/64 bits), youre going to run into problems.
So I [asked on Mastodon again][2] for examples of integer problems and got all kinds of great responses again. Heres a table of contents.
[example 1: the small database primary key][3][example 2: integer overflow/underflow][4][aside: how do computers represent negative integers?][5][example 3: decoding a binary format in Java][6][example 4: misinterpreting an IP address or string as an integer][7][example 5: security problems because of integer overflow][8][example 6: the case of the mystery byte order][9][example 7: modulo of negative numbers][10][example 8: compilers removing integer overflow checks][11][example 9: the && typo][12]
Like last time, Ive written some example programs to demonstrate these
problems. Ive tried to use a variety of languages in the examples (Go,
Javascript, Java, and C) to show that these problems dont just show up in
super low level C programs integers are everywhere!
Also Ive probably made some mistakes in here, I learned several things while writing this.
#### example 1: the small database primary key
One of the most classic (and most painful!) integer problems is:
- You create a database table where the primary key is a 32-bit unsigned integer, thinking “4 billion rows should be enough for anyone!”
- You are massively successful and eventually, your table gets close to 4 billion rows
- oh no!
- You need to do a database migration to switch your primary key to be a 64-bit integer instead
If the primary key actually reaches its maximum value Im not sure exactly what
happens, Id imagine you wouldnt be able to create any new database rows and
it would be a very bad day for your massively successful service.
#### example 2: integer overflow/underflow
Heres a Go program:
```
package main
import "fmt"
func main() {
var x uint32 = 5
var length uint32 = 0
if x < length-1 {
fmt.Printf("%d is less than %d\n", x, length-1)
}
}
```
This slightly mysteriously prints out:
```
5 is less than 4294967295
```
That true, but its not what you might have expected.
#### whats going on?
`0 - 1` is equal to the 4 bytes `0xFFFFFFFF`.
There are 2 ways to interpret those 4 bytes:
- As a _signed_ integer (-1)
- As an _unsigned_ integer (4294967295)
Go here is treating `length - 1` as a **unsigned** integer, because we defined `x` and `length` as uint32s (the “u” is for “unsigned”). So its testing if 5 is less than 4294967295, which it is!
#### what do we do about it?
Im not actually sure if theres any way to automatically detect integer overflow errors in Go. (though it looks like theres a [github issue from 2019 with some discussion][13])
Some brief notes about other languages:
- Lots of languages (Python, Java, Ruby) dont have unsigned integers at all, so this specific problem doesnt come up
- In C, you can compile with `clang -fsanitize=unsigned-integer-overflow`. Then if your code has an overflow/underflow like this, the program will crash.
- Similarly in Rust, if you compile your program in debug mode itll crash if theres an integer overflow. But in release mode it wont crash, itll just happily decide that 0 - 1 = 4294967295.
The reason Rust doesnt check for overflows if you compile your program in
release mode (and the reason C and Go dont check) is that these checks are
expensive! Integer arithmetic is a very big part of many computations, and
making sure that every single addition isnt overflowing makes it slower.
#### aside: how do computers represent negative integers?
I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or
`4294967295`. You might be thinking what??? Why would `0xFFFFFFFF` mean `-1`?
So lets talk about how computers represent negative integers for a second.
Im going to simplify and talk about 8-bit integers instead of 32-bit integers,
because there are less of them and it works basically the same way.
You can represent 256 different numbers with an 8-bit integer: 0 to 255
```
00000000 -> 0
00000001 -> 1
00000010 -> 2
...
11111111 -> 255
```
But what if you want to represent _negative_ integers? We still only have 8
bits! So we need to reassign some of these and treat them as negative numbers
instead.
Heres the way most modern computers do it:
- Every number thats 128 or more becomes a negative number instead
- How to know _which_ negative number it is: take the positive integer youd expect it to be, and then subtract 256
So 255 becomes -1, 128 becomes -128, and 200 becomes -56.
Here are some maps of bits to numbers:
```
00000000 -> 0
00000001 -> 1
00000010 -> 2
01111111 -> 127
10000000 -> -128 (previously 128)
10000001 -> -127 (previously 129)
10000010 -> -126 (previously 130)
...
11111111 -> -1 (previously 255)
```
This gives us 256 numbers, from -128 to 127.
And `11111111` (or `0xFF`, or 255) is -1.
For 32 bit integers, its the same story, except its “every number larger than 2^31 becomes negative” and “subtract 2^32”. And similarly for other integer sizes.
Thats how we end up with `0xFFFFFFFF` meaning -1.
#### there are multiple ways to represent negative integers
The way we just talked about of representing negative integers (“its the equivalent positive integer, but you subtract 2^n”) is called
**twos complement**, and its the most common on modern computers. There are several other ways
though, the [wikipedia article has a list][14].
#### weird thing: the absolute value of -128 is negative
This [Go program][15] has a pretty simple `abs()` function that computes the absolute value of an integer:
```
package main
import (
"fmt"
)
func abs(x int8) int8 {
if x < 0 {
return -x
}
return x
}
func main() {
fmt.Println(abs(-127))
fmt.Println(abs(-128))
}
```
This prints out:
```
127
-128
```
This is because the signed 8-bit integers go from -128 to 127 there **is** no +128!
Some programs might crash when you try to do this (its an overflow), but Go
doesnt.
Now that weve talked about signed integers a bunch, lets dig into another example of how they can cause problems.
#### example 3: decoding a binary format in Java
Lets say youre parsing a binary format in Java, and you want to get the first
4 bits of the byte `0x90`. The correct answer is 9.
```
public class Main {
public static void main(String[] args) {
byte b = (byte) 0x90;
System.out.println(b >> 4);
}
}
```
This prints out “-7”. Thats not right!
#### whats going on?
There are two things we need to know about Java to make sense of this:
- Java doesnt have unsigned integers.
- Java cant right shift bytes, it can only shift integers. So anytime you shift a byte, it has to be promoted into an integer.
Lets break down what those two facts mean for our little calculation `b >> 4`:
- In bits, `0x90` is `10010000`. This starts with a 1, which means that its more than 128, which means its a negative number
- Java sees the `>>` and decides to promote `0x90` to an integer, so that it can shift it
- The way you convert a negative byte to an 32-bit integer is to add a bunch of `1`s at the beginning. So now our 32-bit integer is `0xFFFFFF90` (`F` being 15, or `1111`)
- Now we right shift (`b >> 4`). By default, Java does a **signed shift**, which means that it adds 0s to the beginning if its positive, and 1s to the beginning if its negative. (`>>>` is an unsigned shift in Java)
- We end up with `0xFFFFFFF9` (having cut off the last 4 bits and added more 1s at the beginning)
- As a signed integer, thats -7!
#### what can you do about it?
I dont the actual idiomatic way to do this in Java is, but the way Id naively
approach fixing this is to put in a bit mask before doing the right shift. So
instead of:
```
b >> 4
```
wed write
```
(b & 0xFF) >> 4
```
`b & 0xFF` seems redundant (`b` is already a byte!), but its actually not because `b` is being promoted to an integer.
Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> 0x00000009`, which is the result we wanted: 9.
And when we actually try it, it prints out “9”.
Also, if we were using a language with unsigned integers, the natural way to
deal with this would be to treat the value as an unsigned integer in the first
place. But thats not possible in Java.
#### example 4: misinterpreting an IP address or string as an integer
I dont know if this is technically a “problem with integers” but its funny
so Ill mention it: [Rachel by the bay][16] has a bunch of great
examples of things that are not integers being interpreted as integers. For
example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
She points out that you can actually ping any integer, and itll convert that integer into an IP address, for example:
```
$ ping 2130706433
PING 2130706433 (127.0.0.1): 56 data bytes
$ ping 132848123841239999988888888888234234234234234234
PING 132848123841239999988888888888234234234234234234 (251.164.101.122): 56 data bytes
```
(Im not actually sure how ping is parsing that second integer or why ping accepts these giant larger-than-2^64-integers as valid inputs, but its a fun weird thing)
#### example 5: security problems because of integer overflow
Another integer overflow example: heres a [search for CVEs involving integer overflows][17].
There are a lot! Im not a security person, but heres one random example: this [json parsing library bug][18]
My understanding of that json parsing bug is roughly:
- you load a JSON file thats 3GB or something, or 3,000,000,000
- due to an integer overflow, the code allocates close to 0 bytes of memory instead of ~3GB amount of memory
- but the JSON file is still 3GB, so it gets copied into the tiny buffer with almost 0 bytes of memory
- this overwrites all kinds of other memory that its not supposed to
The CVE says “This vulnerability mostly impacts process availability”, which I
think means “the program crashes”, but sometimes this kind of thing is much
worse and can result in arbitrary code execution.
My impression is that there are a large variety of different flavours of
security vulnerabilities caused by integer overflows.
#### example 6: the case of the mystery byte order
One person said that theyre do scientific computing and sometimes they need to
read files which contain data with an unknown byte order.
Lets invent a small example of this: say youre reading a file which contains 4
bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know
represent a 4-byte integer. There are 2 ways to interpret that integer:
- `0x00001281` (which translates to 4737). This order is called “big endian”
- `0x81120000` (which translates to 2165440512). This order is called “little endian”.
Which one is it? Well, maybe the file contains some metadata that specifies the
endianness. Or maybe you happen to know what machine it was generated on and
what byte order that machine uses. Or maybe you just read a bunch of values,
try both orders, and figure out which makes more sense. Maybe 2165440512 is too
big to make sense in the context of whatever your data is supposed to mean, or
maybe `4737` is too small.
A couple more notes on this:
- this isnt just a problem with integers, floating point numbers have byte
order too
- this also comes up when reading data from a network, but in that case the
byte order isnt a “mystery”, its just going to be big endian. But x86
machines (and many others) are little endian, so you have to swap the byte
order of all your numbers.
#### example 7: modulo of negative numbers
This is more of a design decision about how different programming languages design their math libraries, but its still a little weird and lots of people mentioned it.
Lets say you write `-13 % 3` in your program, or `13 % -3`. Whats the result?
It turns out that different programming languages do it differently, for
example in Python `-13 % 3 = 2` but in Javascript `-13 % 3 = -1`.
Theres a table in [this blog post][19] that
describes a bunch of different programming languages choices.
#### example 8: compilers removing integer overflow checks
Weve been hearing a lot about integer overflow and why its bad. So lets
imagine you try to be safe and include some checks in your programs after
each addition, you make sure that the calculation didnt overflow. Like this:
```
#include <stdio.h>
#define INT_MAX 2147483647
int check_overflow(int n) {
n = n + 100;
if (n + 100 < 0)
return -1;
return 0;
}
int main() {
int result = check_overflow(INT_MAX);
printf("%d\n", result);
}
```
`check_overflow` here should return `-1` (failure), because `INT_MAX + 100` is more than the maximum integer size.
```
$ gcc check_overflow.c -o check_overflow && ./check_overflow
-1
$ gcc -O3 check_overflow.c -o check_overflow && ./check_overflow
0
```
Thats weird when we compile with `gcc`, we get the answer we expected, but
with `gcc -O3`, we get a different answer. Why?
#### whats going on?
My understanding (which might be wrong) is:
- Signed integer overflow in C is **undefined behavior**. I think thats
because different C implementations might be using different representations
of signed integers (maybe theyre using ones complement instead of twos
complement or something)
- “undefined behaviour” in C means “the compiler is free to do literally whatever it wants after that point” (see this post [With undefined behaviour, anything is possible][20] by Raph Levine for a lot more)
- Some compiler optimizations assume that undefined behaviour will never
happen. Theyre free to do this, because if that undefined behaviour
_did_ happen, then theyre allowed to do whatever they want, so “run the
code that I optimized assuming that this would never happen” is fine.
- So this `if (n + 100 < 0)` check is irrelevant if that did
happen, it would be undefined behaviour, so theres no need to execute the
contents of that if statement.
So, thats weird. Im not going to write a “what can you do about it?” section here because Im pretty out of my depth already.
I certainly would not have expected that though.
My impression is that “undefined behaviour” is really a C/C++ concept, and
doesnt exist in other languages in the same way except in the case of “your
program called some C code in an incorrect way and that C code did something
weird because of undefined behaviour”. Which of course happens all the time.
#### example 9: the && typo
This one was mentioned as a very upsetting bug. Lets say you have two integers
and you want to check that theyre both nonzero.
In Javascript, you might write:
```
if a && b {
/* some code */
}
```
But you could also make a typo and type:
```
if a & b {
/* some code */
}
```
This is still perfectly valid code, but it means something completely different
its a bitwise and instead of a boolean and. Lets go into a Javascript
console and look at bitwise vs boolean and for `9` and `4`:
```
> 9 && 4
4
> 9 & 4
0
> 4 && 5
5
> 4 & 5
4
```
Its easy to imagine this turning into a REALLY annoying bug since it would be
intermittent often `x & y` does turn out to be truthy if `x && y` is truthy.
#### what to do about it?
For Javascript, ESLint has a [no-bitwise check][21] check), which
requires you manually flag “no, I actually know what Im doing, I want to do
bitwise and” if you use a bitwise and in your code. Im sure many other linters
have a similar check.
#### thats all for now!
There are definitely more problems with integers than this, but this got pretty
long again and Im tired of writing again so Im going to stop :)
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2023/01/18/examples-of-problems-with-integers/
作者:[Julia Evans][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://jvns.ca/
[b]: https://github.com/lkxed/
[1]: https://jvns.ca/blog/2023/01/13/examples-of-floating-point-problems/
[2]: https://social.jvns.ca/@b0rk/109700446576896509
[3]: https://jvns.ca#example-1-the-small-database-primary-key
[4]: https://jvns.ca#example-2-integer-overflow-underflow
[5]: https://jvns.ca#aside-how-do-computers-represent-negative-integers
[6]: https://jvns.ca#example-3-decoding-a-binary-format-in-java
[7]: https://jvns.ca#example-4-misinterpreting-an-ip-address-or-string-as-an-integer
[8]: https://jvns.ca#example-5-security-problems-because-of-integer-overflow
[9]: https://jvns.ca#example-6-the-case-of-the-mystery-byte-order
[10]: https://jvns.ca#example-7-modulo-of-negative-numbers
[11]: https://jvns.ca#example-8-compilers-removing-integer-overflow-checks
[12]: https://jvns.ca#example-9-the-typo
[13]: https://github.com/golang/go/issues/30613
[14]: https://en.wikipedia.org/wiki/Signed_number_representations
[15]: https://go.dev/play/p/iSFxbFAe75M
[16]: https://rachelbythebay.com/w/2020/11/26/magic/
[17]: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=integer+overflow
[18]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-24795
[19]: https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
[20]: https://raphlinus.github.io/programming/rust/2018/08/17/undefined-behavior.html
[21]: https://eslint.org/docs/latest/rules/no-bitwise