This got me thinking – but what about integers? Of course integers have all kinds of problems too – anytime you represent a number in a small fixed amount of space (like 8/16/32/64 bits), you’re going to run into problems.
Like last time, I’ve written some example programs to demonstrate these problems. I’ve tried to use a variety of languages in the examples (Go, Javascript, Java, and C) to show that these problems don’t just show up in super low level C programs – integers are everywhere!
If the primary key actually reaches its maximum value I’m not sure exactly what happens, I’d imagine you wouldn’t be able to create any new database rows and it would be a very bad day for your massively successful service.
That true, but it’s not what you might have expected.
#### what’s going on?
`0 - 1` is equal to the 4 bytes `0xFFFFFFFF`.
There are 2 ways to interpret those 4 bytes:
- As a _signed_ integer (-1)
- As an _unsigned_ integer (4294967295)
Go here is treating `length - 1` as a **unsigned** integer, because we defined `x` and `length` as uint32s (the “u” is for “unsigned”). So it’s testing if 5 is less than 4294967295, which it is!
#### what do we do about it?
I’m not actually sure if there’s any way to automatically detect integer overflow errors in Go. (though it looks like there’s a [github issue from 2019 with some discussion][13])
Some brief notes about other languages:
- Lots of languages (Python, Java, Ruby) don’t have unsigned integers at all, so this specific problem doesn’t come up
- In C, you can compile with `clang -fsanitize=unsigned-integer-overflow`. Then if your code has an overflow/underflow like this, the program will crash.
- Similarly in Rust, if you compile your program in debug mode it’ll crash if there’s an integer overflow. But in release mode it won’t crash, it’ll just happily decide that 0 - 1 = 4294967295.
The reason Rust doesn’t check for overflows if you compile your program in release mode (and the reason C and Go don’t check) is that – these checks are expensive! Integer arithmetic is a very big part of many computations, and making sure that every single addition isn’t overflowing makes it slower.
I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or `4294967295`. You might be thinking – what??? Why would `0xFFFFFFFF` mean `-1`?
But what if you want to represent _negative_ integers? We still only have 8 bits! So we need to reassign some of these and treat them as negative numbers instead.
- Every number that’s 128 or more becomes a negative number instead
- How to know _which_ negative number it is: take the positive integer you’d expect it to be, and then subtract 256
So 255 becomes -1, 128 becomes -128, and 200 becomes -56.
Here are some maps of bits to numbers:
```
00000000 -> 0
00000001 -> 1
00000010 -> 2
01111111 -> 127
10000000 -> -128 (previously 128)
10000001 -> -127 (previously 129)
10000010 -> -126 (previously 130)
...
11111111 -> -1 (previously 255)
```
This gives us 256 numbers, from -128 to 127.
And `11111111` (or `0xFF`, or 255) is -1.
For 32 bit integers, it’s the same story, except it’s “every number larger than 2^31 becomes negative” and “subtract 2^32”. And similarly for other integer sizes.
That’s how we end up with `0xFFFFFFFF` meaning -1.
#### there are multiple ways to represent negative integers
The way we just talked about of representing negative integers (“it’s the equivalent positive integer, but you subtract 2^n”) is called **two’s complement**, and it’s the most common on modern computers. There are several other ways though, the [wikipedia article has a list][14].
This is because the signed 8-bit integers go from -128 to 127 – there **is** no +128! Some programs might crash when you try to do this (it’s an overflow), but Go doesn’t.
There are two things we need to know about Java to make sense of this:
- Java doesn’t have unsigned integers.
- Java can’t right shift bytes, it can only shift integers. So anytime you shift a byte, it has to be promoted into an integer.
Let’s break down what those two facts mean for our little calculation `b >> 4`:
- In bits, `0x90` is `10010000`. This starts with a 1, which means that it’s more than 128, which means it’s a negative number
- Java sees the `>>` and decides to promote `0x90` to an integer, so that it can shift it
- The way you convert a negative byte to an 32-bit integer is to add a bunch of `1`s at the beginning. So now our 32-bit integer is `0xFFFFFF90` (`F` being 15, or `1111`)
- Now we right shift (`b >> 4`). By default, Java does a **signed shift**, which means that it adds 0s to the beginning if it’s positive, and 1s to the beginning if it’s negative. (`>>>` is an unsigned shift in Java)
- We end up with `0xFFFFFFF9` (having cut off the last 4 bits and added more 1s at the beginning)
I don’t the actual idiomatic way to do this in Java is, but the way I’d naively approach fixing this is to put in a bit mask before doing the right shift. So instead of:
Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> x00000009`, which is the result we wanted: 9.
Also, if we were using a language with unsigned integers, the natural way to deal with this would be to treat the value as an unsigned integer in the first place. But that’s not possible in Java.
I don’t know if this is technically a “problem with integers” but it’s funny so I’ll mention it: [Rachel by the bay][16] has a bunch of great examples of things that are not integers being interpreted as integers. For example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
PING 132848123841239999988888888888234234234234234234 (251.164.101.122): 56 data bytes
```
(I’m not actually sure how ping is parsing that second integer or why ping accepts these giant larger-than-2^64-integers as valid inputs, but it’s a fun weird thing)
#### example 5: security problems because of integer overflow
Another integer overflow example: here’s a [search for CVEs involving integer overflows][17]. There are a lot! I’m not a security person, but here’s one random example: this [json parsing library bug][18]
The CVE says “This vulnerability mostly impacts process availability”, which I think means “the program crashes”, but sometimes this kind of thing is much worse and can result in arbitrary code execution.
Let’s invent a small example of this: say you’re reading a file which contains 4 bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know represent a 4-byte integer. There are 2 ways to interpret that integer:
Which one is it? Well, maybe the file contains some metadata that specifies the endianness. Or maybe you happen to know what machine it was generated on and what byte order that machine uses. Or maybe you just read a bunch of values, try both orders, and figure out which makes more sense. Maybe 2165440512 is too big to make sense in the context of whatever your data is supposed to mean, or maybe `4737` is too small.
- this isn’t just a problem with integers, floating point numbers have byte order too
- this also comes up when reading data from a network, but in that case the byte order isn’t a “mystery”, it’s just going to be big endian. But x86 machines (and many others) are little endian, so you have to swap the byte order of all your numbers.
This is more of a design decision about how different programming languages design their math libraries, but it’s still a little weird and lots of people mentioned it.
Let’s say you write `-13 % 3` in your program, or `13 % -3`. What’s the result?
We’ve been hearing a lot about integer overflow and why it’s bad. So let’s imagine you try to be safe and include some checks in your programs – after each addition, you make sure that the calculation didn’t overflow. Like this:
- Signed integer overflow in C is **undefined behavior**. I think that’s because different C implementations might be using different representations of signed integers (maybe they’re using one’s complement instead of two’s complement or something)
- “undefined behaviour” in C means “the compiler is free to do literally whatever it wants after that point” (see this post [With undefined behaviour, anything is possible][20] by Raph Levine for a lot more)
- Some compiler optimizations assume that undefined behaviour will never happen. They’re free to do this, because – if that undefined behaviour _did_ happen, then they’re allowed to do whatever they want, so “run the code that I optimized assuming that this would never happen” is fine.
- So this `if (n + 100 < 0)` check is irrelevant – if that did happen, it would be undefined behaviour, so there’s no need to execute the contents of that if statement.
My impression is that “undefined behaviour” is really a C/C++ concept, and doesn’t exist in other languages in the same way except in the case of “your program called some C code in an incorrect way and that C code did something weird because of undefined behaviour”. Which of course happens all the time.
This is still perfectly valid code, but it means something completely different – it’s a bitwise and instead of a boolean and. Let’s go into a Javascript console and look at bitwise vs boolean and for `9` and `4`:
It’s easy to imagine this turning into a REALLY annoying bug since it would be intermittent – often `x & y` does turn out to be truthy if `x && y` is truthy.
For Javascript, ESLint has a [no-bitwise check][21] check), which requires you manually flag “no, I actually know what I’m doing, I want to do bitwise and” if you use a bitwise and in your code. I’m sure many other linters have a similar check.