Merge pull request #29072 from lkxed/master

修复 jvns.ca 系列文章格式
This commit is contained in:
六开箱 2023-04-08 13:33:43 +08:00 committed by GitHub
commit bc1f6e90c3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 213 additions and 606 deletions

View File

@ -24,16 +24,20 @@ Ive heard a million times about the dangers of floating point arithmetic, lik
But I find all of this a little abstract on its own, and I really wanted some
specific examples of floating point bugs in real-world programs.
So I [asked on Mastodon][1] for
examples of how floating point has gone wrong for them in real programs, and as
always folks delivered! Here are a bunch of examples. Ive also written some
example programs for some of them to see exactly what happens. Heres a table of contents:
So I [asked on Mastodon][1] for examples of how floating point has gone wrong for them in real programs, and as always folks delivered! Here are a bunch of examples. Ive also written some example programs for some of them to see exactly what happens. Heres a table of contents:
[how does floating point work?][2][floating point isnt “bad” or random][3][example 1: the odometer that stopped][4][example 2: tweet IDs in Javascript][5][example 3: a variance calculation gone wrong][6][example 4: different languages sometimes do the same floating point calculation differently][7][example 5: the deep space kraken][8][example 6: the inaccurate timestamp][9][example 7: splitting a page into columns][10][example 8: collision checking][11]
- [how does floating point work?][2]
- [floating point isnt “bad” or random][3]
- [example 1: the odometer that stopped][4]
- [example 2: tweet IDs in Javascript][5]
- [example 3: a variance calculation gone wrong][6]
- [example 4: different languages sometimes do the same floating point calculation differently][7]
- [example 5: the deep space kraken][8]
- [example 6: the inaccurate timestamp][9]
- [example 7: splitting a page into columns][10]
- [example 8: collision checking][11]
None of these 8 examples talk about NaNs or +0/-0 or infinity values or
subnormals, but its not because those things dont cause problems its just
that I got tired of writing at some point :).
None of these 8 examples talk about NaNs or +0/-0 or infinity values or subnormals, but its not because those things dont cause problems its just that I got tired of writing at some point :).
Also Ive probably made some mistakes in this post.
@ -45,35 +49,21 @@ Im not going to write a long explanation of how floating point works in this
#### floating point isnt “bad” or random
I dont want you to read this post and conclude that floating point is bad.
Its an amazing tool for doing numerical calculations. So many smart people
have done so much work to make numerical calculations on computers efficient and
accurate! Two points about how all of this isnt floating points fault:
I dont want you to read this post and conclude that floating point is bad. Its an amazing tool for doing numerical calculations. So many smart people have done so much work to make numerical calculations on computers efficient and accurate! Two points about how all of this isnt floating points fault:
- Doing numerical computations on a computer inherently involves
some approximation and rounding, especially if you want to do it
efficiently. You cant always store an arbitrary amount of precision for
- Doing numerical computations on a computer inherently involves some approximation and rounding, especially if you want to do it efficiently. You cant always store an arbitrary amount of precision for
every single number youre working with.
- Floating point is standardized (IEEE 754), so operations like addition on
floating point numbers are deterministic my understanding is that 0.1 +
0.2 will always give you the exact same result (0.30000000000000004), even
across different architectures. It might not be the result you _expected_,
but its actually very predictable.
- Floating point is standardized (IEEE 754), so operations like addition on floating point numbers are deterministic my understanding is that 0.1 + 0.2 will always give you the exact same result (0.30000000000000004), even across different architectures. It might not be the result you _expected_, but its actually very predictable.
My goal for this post is just to explain what kind of problems can come up with
floating point numbers and why they happen so that you know when to be
careful with them, and when theyre not appropriate.
My goal for this post is just to explain what kind of problems can come up with floating point numbers and why they happen so that you know when to be careful with them, and when theyre not appropriate.
Now lets get into the examples.
#### example 1: the odometer that stopped
One person said that they were working on an odometer that was continuously
adding small amounts to a 32-bit float to measure distance travelled, and
things went very wrong.
One person said that they were working on an odometer that was continuously adding small amounts to a 32-bit float to measure distance travelled, and things went very wrong.
To make this concrete, lets say that were adding numbers to the odometer 1cm
at a time. What does it look like after 10,000 kilometers?
To make this concrete, lets say that were adding numbers to the odometer 1cm at a time. What does it look like after 10,000 kilometers?
Heres a C program that simulates that:
@ -101,10 +91,7 @@ This is VERY bad its not a small error, 262km is a LOT less than 10,000km
#### what went wrong: gaps between floating point numbers get big
The problem in this case is that, for 32-bit floats, 262144.0 + 0.01 = 262144.0.
So its not just that the number is inaccurate, itll actually never increase
at all! If we travelled another 10,000 kilometers, the odometer would still be
stuck at 262144 meters (aka 262.144km).
The problem in this case is that, for 32-bit floats, 262144.0 + 0.01 = 262144.0. So its not just that the number is inaccurate, itll actually never increase at all! If we travelled another 10,000 kilometers, the odometer would still be stuck at 262144 meters (aka 262.144km).
Why is this happening? Well, floating point numbers get farther apart as they get bigger. In this example, for 32-bit floats, here are 3 consecutive floating point numbers:
@ -116,13 +103,9 @@ I got those numbers by going to [https://float.exposed/0x48800000][13] and incre
So, there are no 32-bit floating point numbers between 262144.0 and 262144.03125. Why is that a problem?
The problem is that 262144.03125 is about 262144.0 + 0.03. So when we try to
add 0.01 to 262144.0, it doesnt make sense to round up to the next number. So
the sum just stays at 262144.0.
The problem is that 262144.03125 is about 262144.0 + 0.03. So when we try to add 0.01 to 262144.0, it doesnt make sense to round up to the next number. So the sum just stays at 262144.0.
Also, its not a coincidence that 262144 is a power of 2 (its 2^18). The gaps
been floating point numbers change after every power of 2, and at 2^18 the gap
between 32-bit floats is 0.03125, increasing from 0.016ish.
Also, its not a coincidence that 262144 is a power of 2 (its 2^18). The gaps been floating point numbers change after every power of 2, and at 2^18 the gap between 32-bit floats is 0.03125, increasing from 0.016ish.
#### one way to solve this: use a double
@ -133,41 +116,26 @@ Expected: 10000.000000 km
Got: 9999.999825 km
```
There are still some small inaccuracies here were off about 17 centimeters.
Whether this matters or not depends on the context: being slightly off could very
well be disastrous if we were doing a precision space maneuver or something, but
its probably fine for an odometer.
There are still some small inaccuracies here were off about 17 centimeters. Whether this matters or not depends on the context: being slightly off could very well be disastrous if we were doing a precision space maneuver or something, but its probably fine for an odometer.
Another way to improve this would be to increment the odometer in bigger chunks
instead of adding 1cm at a time, maybe we could update it less frequently,
like every 50cm.
Another way to improve this would be to increment the odometer in bigger chunks instead of adding 1cm at a time, maybe we could update it less frequently, like every 50cm.
If we use a double **and** increment by 50cm instead of 1cm, we get the exact
correct answer:
If we use a double **and** increment by 50cm instead of 1cm, we get the exact correct answer:
```
Expected: 10000.000000 km
Got: 10000.000000 km
```
A third way to solve this could be to use an **integer**: maybe we decide that
the smallest unit we care about is 0.1mm, and then measure everything as
integer multiples of 0.1mm. I have never built an odometer so I cant say what
the best approach is.
A third way to solve this could be to use an **integer**: maybe we decide that the smallest unit we care about is 0.1mm, and then measure everything as integer multiples of 0.1mm. I have never built an odometer so I cant say what the best approach is.
#### example 2: tweet IDs in Javascript
Javascript only has floating point numbers it doesnt have an integer type.
The biggest integer you can represent in a 64-bit floating point number is
2^53.
Javascript only has floating point numbers it doesnt have an integer type. The biggest integer you can represent in a 64-bit floating point number is 2^53.
But tweet IDs are big numbers, bigger than 2^53. The Twitter API now returns
them as both integers and strings, so that in Javascript you can just use the
string ID (like “1612850010110005250”), but if you tried to use the integer
version in JS, things would go very wrong.
But tweet IDs are big numbers, bigger than 2^53. The Twitter API now returns them as both integers and strings, so that in Javascript you can just use the string ID (like “1612850010110005250”), but if you tried to use the integer version in JS, things would go very wrong.
You can check this yourself by taking a tweet ID and putting it in the
Javascript console, like this:
You can check this yourself by taking a tweet ID and putting it in the Javascript console, like this:
```
>> 1612850010110005250
@ -176,8 +144,7 @@ Javascript console, like this:
Notice that 1612850010110005200 is NOT the same number as 1612850010110005250!! Its 50 less!
This particular issue doesnt happen in Python (or any other language that I
know of), because Python has integers. Heres what happens if we enter the same number in a Python REPL:
This particular issue doesnt happen in Python (or any other language that I know of), because Python has integers. Heres what happens if we enter the same number in a Python REPL:
```
In [3]: 1612850010110005250
@ -188,14 +155,9 @@ Same number, as youd expect.
#### example 2.1: the corrupted JSON data
This is a small variant of the “tweet IDs in Javascript” issue, but even if
youre _not_ actually writing Javascript code, numbers in JSON are still sometimes
treated as if theyre floats. This mostly makes sense to me because JSON has
“Javascript” in the name, so it seems reasonable to decode the values the way
Javascript would.
This is a small variant of the “tweet IDs in Javascript” issue, but even if youre _not_ actually writing Javascript code, numbers in JSON are still sometimes treated as if theyre floats. This mostly makes sense to me because JSON has “Javascript” in the name, so it seems reasonable to decode the values the way Javascript would.
For example, if we pass some JSON through `jq`, we see the exact same issue:
the number 1612850010110005250 gets changed into 1612850010110005200.
For example, if we pass some JSON through `jq`, we see the exact same issue: the number 1612850010110005250 gets changed into 1612850010110005200.
```
$ echo '{"id": 1612850010110005250}' | jq '.'
@ -206,19 +168,13 @@ $ echo '{"id": 1612850010110005250}' | jq '.'
But its not consistent across all JSON libraries Pythons `json` module will decode `1612850010110005250` as the correct integer.
Several people mentioned issues with sending floats in JSON, whether either
they were trying to send a large integer (like a pointer address) in JSON and
it got corrupted, or sending smaller floating point values back and forth
repeatedly and the value slowly diverging over time.
Several people mentioned issues with sending floats in JSON, whether either they were trying to send a large integer (like a pointer address) in JSON and it got corrupted, or sending smaller floating point values back and forth repeatedly and the value slowly diverging over time.
#### example 3: a variance calculation gone wrong
Lets say youre doing some statistics, and you want to calculate the variance
of many numbers. Maybe more numbers than you can easily fit in memory, so you
want to do it in a single pass.
Lets say youre doing some statistics, and you want to calculate the variance of many numbers. Maybe more numbers than you can easily fit in memory, so you want to do it in a single pass.
Theres a simple (but bad!!!) algorithm you can use to calculate the variance in a single pass,
from [this blog post][14]. Heres some Python code:
Theres a simple (but bad!!!) algorithm you can use to calculate the variance in a single pass, from [this blog post][14]. Heres some Python code:
```
def calculate_bad_variance(nums):
@ -246,7 +202,7 @@ Bad variance: 13.840000000000003 <- pretty close!
Now, lets try it the same 100,000 large numbers that are very close together (distributed between 100000000 and 100000000.06)
```
In [7]: calculate_bad_variance(np.random.uniform(100000000, 100000000.06, 100000))
In [7]: calculate_bad_variance(np.random.uniform(100000000, 100000000.06, 100000))
Real variance: 0.00029959105209321173
Bad variance: -138.93632 <- OH NO
```
@ -255,50 +211,27 @@ This is extremely bad: not only is the bad variance way off, its NEGATIVE! (t
#### what went wrong: catastrophic cancellation
Whats going here is similar to our odometer number problem: the
`sum_of_squares` number gets extremely big (about 10^21 or 2^69), and at that point, the
gap between consecutive floating point numbers is also very big its 2**46.
So we just lose all precision in our calculations.
Whats going here is similar to our odometer number problem: the `sum_of_squares` number gets extremely big (about 10^21 or 2^69), and at that point, the gap between consecutive floating point numbers is also very big its 2**46. So we just lose all precision in our calculations.
The term for this problem is “catastrophic cancellation” were subtracting
two very large floating point numbers which are both going to be pretty far
from the correct value of the calculation, so the result of the subtraction is
also going to be wrong.
The term for this problem is “catastrophic cancellation” were subtracting two very large floating point numbers which are both going to be pretty far from the correct value of the calculation, so the result of the subtraction is also going to be wrong. [The blog post I mentioned before][14]
talks about a better algorithm people use to compute variance called Welfords algorithm, which doesnt have the catastrophic cancellation issue.
[The blog post I mentioned before][14]
talks about a better algorithm people use to compute variance called
Welfords algorithm, which doesnt have the catastrophic cancellation issue.
And of course, the solution for most people is to just use a scientific
computing library like Numpy to calculate variance instead of trying to do it
yourself :)
And of course, the solution for most people is to just use a scientific computing library like Numpy to calculate variance instead of trying to do it yourself :)
#### example 4: different languages sometimes do the same floating point calculation differently
A bunch of people mentioned that different platforms will do the same
calculation in different ways. One way this shows up in practice is maybe
you have some frontend code and some backend code that do the exact same
floating point calculation. But its done slightly differently in Javascript
and in PHP, so you users end up seeing discrepancies and getting confused.
A bunch of people mentioned that different platforms will do the same calculation in different ways. One way this shows up in practice is maybe you have some frontend code and some backend code that do the exact same floating point calculation. But its done slightly differently in Javascript and in PHP, so you users end up seeing discrepancies and getting confused.
In principle you might think that different implementations should work the
same way because of the IEEE 754 standard for floating point, but here are a
couple of caveats that were mentioned:
In principle you might think that different implementations should work the same way because of the IEEE 754 standard for floating point, but here are a couple of caveats that were mentioned:
- math operations in libc (like sin/log) behave differently in different
implementations. So code using glibc could give you different results than
code using musl
- some x86 instructions can use 80 bit precision for some double operations
internally instead of 64 bit precision. [Heres a GitHub issue talking about
that][15]
- math operations in libc (like sin/log) behave differently in different implementations. So code using glibc could give you different results than code using musl
- some x86 instructions can use 80 bit precision for some double operations internally instead of 64 bit precision. [Heres a GitHub issue talking about that][15]
Im not very sure about these points and I dont have concrete examples I can reproduce.
#### example 5: the deep space kraken
Kerbal Space Program is a space simulation game, and it used to have a bug
called the [Deep Space Kraken][16] where when
you moved very fast, your ship would start getting destroyed due to floating point issues. This is similar to the other problems weve talked out involving big floating numbers (like the variance problem), but I wanted to mention it because:
Kerbal Space Program is a space simulation game, and it used to have a bug called the [Deep Space Kraken][16] where when you moved very fast, your ship would start getting destroyed due to floating point issues. This is similar to the other problems weve talked out involving big floating numbers (like the variance problem), but I wanted to mention it because:
- it has a funny name
- it seems like a very common bug in video games / astrophysics / simulations in general if you have points that are very far from the origin, your math gets messed up
@ -307,32 +240,24 @@ Another example of this is the [Far Lands][17] in Minecraft.
#### example 6: the inaccurate timestamp
I promise this is the last example of “very large floating numbers can ruin your day”.
But! Just one more! Lets imagine that we try to represent the current Unix epoch in nanoseconds
(about 1673580409000000000) as a 64-bit floating point number.
I promise this is the last example of “very large floating numbers can ruin your day”. But! Just one more! Lets imagine that we try to represent the current Unix epoch in nanoseconds (about 1673580409000000000) as a 64-bit floating point number.
This is no good! 1673580409000000000 is about 2^60 (crucially, bigger than 2^53), and the next 64-bit float after it is 1673580409000000256.
So this would be a great way to end up with inaccuracies in your time math. Of
course, time libraries actually represent times as integers, so this isnt
usually a problem. (theres always still the [year 2038 problem][18], but thats not
related to floats)
So this would be a great way to end up with inaccuracies in your time math. Of course, time libraries actually represent times as integers, so this isnt usually a problem. (theres always still the [year 2038 problem][18], but thats not related to floats)
In general, the lesson here is that sometimes its better to use integers.
#### example 7: splitting a page into columns
Now that weve talked about problems with big floating point numbers, lets do
a problem with small floating point numbers.
Now that weve talked about problems with big floating point numbers, lets do a problem with small floating point numbers.
Lets say you have a page width, and a column width, and you want to figure out:
- how many columns fit on the page
- how much space is left over
You might reasonably try `floor(page_width / column_width)` for the first
question and `page_width % column_width` for the second question. Because
that would work just fine with integers!
You might reasonably try `floor(page_width / column_width)` for the first question and `page_width % column_width` for the second question. Because that would work just fine with integers!
```
In [5]: math.floor(13.716 / 4.572)
@ -344,21 +269,15 @@ Out[6]: 4.571999999999999
This is wrong! The amount of space left is 0!
A better way to calculate the amount of space left might have been
`13.716 - 3 * 4.572`, which gives us a very small negative number.
A better way to calculate the amount of space left might have been `13.716 - 3 * 4.572`, which gives us a very small negative number.
I think the lesson here is to never calculate the same thing in 2 different ways with floats.
This is a very basic example but I can kind of see how this would create all
kinds of problems if I was doing page layout with floating point numbers, or
doing CAD drawings.
This is a very basic example but I can kind of see how this would create all kinds of problems if I was doing page layout with floating point numbers, or doing CAD drawings.
#### example 8: collision checking
Heres a very silly Python program, that starts a variable at 1000 and
decrements it until it collides with 0. You can imagine that this is part of a
pong game or something, and that `a` is a ball thats supposed to collide with
a wall.
Heres a very silly Python program, that starts a variable at 1000 and decrements it until it collides with 0. You can imagine that this is part of a pong game or something, and that `a` is a ball thats supposed to collide with a wall.
```
a = 1000
@ -366,21 +285,15 @@ while a != 0:
a -= 0.001
```
You might expect this program to terminate. But it doesnt! `a` is never 0,
instead it goes from 1.673494676862619e-08 to -0.0009999832650532314.
You might expect this program to terminate. But it doesnt! `a` is never 0, instead it goes from 1.673494676862619e-08 to -0.0009999832650532314.
The lesson here is that instead of checking for float equality, usually you
want to check if two numbers are different by some very small amount. Or here
we could just write `while a > 0`.
The lesson here is that instead of checking for float equality, usually you want to check if two numbers are different by some very small amount. Or here we could just write `while a > 0`.
#### thats all for now
I didnt even get to NaNs (the are so many of them!) or infinity or +0 / -0 or subnormals, but weve
already written 2000 words and Im going to just publish this.
I didnt even get to NaNs (the are so many of them!) or infinity or +0 / -0 or subnormals, but weve already written 2000 words and Im going to just publish this.
I might write another followup post later that Mastodon thread has literally
15,000 words of floating point problems in it, theres a lot of material! Or I
might not, who knows :)
I might write another followup post later that Mastodon thread has literally 15,000 words of floating point problems in it, theres a lot of material! Or I might not, who knows :)
--------------------------------------------------------------------------------

View File

@ -12,18 +12,22 @@ Examples of problems with integers
Hello! A few days back we talked about [problems with floating point numbers][1].
This got me thinking but what about integers? Of course integers have all
kinds of problems too anytime you represent a number in a small fixed amount of
space (like 8/16/32/64 bits), youre going to run into problems.
This got me thinking but what about integers? Of course integers have all kinds of problems too anytime you represent a number in a small fixed amount of space (like 8/16/32/64 bits), youre going to run into problems.
So I [asked on Mastodon again][2] for examples of integer problems and got all kinds of great responses again. Heres a table of contents.
[example 1: the small database primary key][3][example 2: integer overflow/underflow][4][aside: how do computers represent negative integers?][5][example 3: decoding a binary format in Java][6][example 4: misinterpreting an IP address or string as an integer][7][example 5: security problems because of integer overflow][8][example 6: the case of the mystery byte order][9][example 7: modulo of negative numbers][10][example 8: compilers removing integer overflow checks][11][example 9: the && typo][12]
- [example 1: the small database primary key][3]
- [example 2: integer overflow/underflow][4]
- [aside: how do computers represent negative integers?][5]
- [example 3: decoding a binary format in Java][6]
- [example 4: misinterpreting an IP address or string as an integer][7]
- [example 5: security problems because of integer overflow][8]
- [example 6: the case of the mystery byte order][9]
- [example 7: modulo of negative numbers][10]
- [example 8: compilers removing integer overflow checks][11]
- [example 9: the && typo][12]
Like last time, Ive written some example programs to demonstrate these
problems. Ive tried to use a variety of languages in the examples (Go,
Javascript, Java, and C) to show that these problems dont just show up in
super low level C programs integers are everywhere!
Like last time, Ive written some example programs to demonstrate these problems. Ive tried to use a variety of languages in the examples (Go, Javascript, Java, and C) to show that these problems dont just show up in super low level C programs integers are everywhere!
Also Ive probably made some mistakes in here, I learned several things while writing this.
@ -36,9 +40,7 @@ One of the most classic (and most painful!) integer problems is:
- oh no!
- You need to do a database migration to switch your primary key to be a 64-bit integer instead
If the primary key actually reaches its maximum value Im not sure exactly what
happens, Id imagine you wouldnt be able to create any new database rows and
it would be a very bad day for your massively successful service.
If the primary key actually reaches its maximum value Im not sure exactly what happens, Id imagine you wouldnt be able to create any new database rows and it would be a very bad day for your massively successful service.
#### example 2: integer overflow/underflow
@ -87,20 +89,15 @@ Some brief notes about other languages:
- In C, you can compile with `clang -fsanitize=unsigned-integer-overflow`. Then if your code has an overflow/underflow like this, the program will crash.
- Similarly in Rust, if you compile your program in debug mode itll crash if theres an integer overflow. But in release mode it wont crash, itll just happily decide that 0 - 1 = 4294967295.
The reason Rust doesnt check for overflows if you compile your program in
release mode (and the reason C and Go dont check) is that these checks are
expensive! Integer arithmetic is a very big part of many computations, and
making sure that every single addition isnt overflowing makes it slower.
The reason Rust doesnt check for overflows if you compile your program in release mode (and the reason C and Go dont check) is that these checks are expensive! Integer arithmetic is a very big part of many computations, and making sure that every single addition isnt overflowing makes it slower.
#### aside: how do computers represent negative integers?
I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or
`4294967295`. You might be thinking what??? Why would `0xFFFFFFFF` mean `-1`?
I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or `4294967295`. You might be thinking what??? Why would `0xFFFFFFFF` mean `-1`?
So lets talk about how computers represent negative integers for a second.
Im going to simplify and talk about 8-bit integers instead of 32-bit integers,
because there are less of them and it works basically the same way.
Im going to simplify and talk about 8-bit integers instead of 32-bit integers, because there are less of them and it works basically the same way.
You can represent 256 different numbers with an 8-bit integer: 0 to 255
@ -112,9 +109,7 @@ You can represent 256 different numbers with an 8-bit integer: 0 to 255
11111111 -> 255
```
But what if you want to represent _negative_ integers? We still only have 8
bits! So we need to reassign some of these and treat them as negative numbers
instead.
But what if you want to represent _negative_ integers? We still only have 8 bits! So we need to reassign some of these and treat them as negative numbers instead.
Heres the way most modern computers do it:
@ -147,9 +142,7 @@ Thats how we end up with `0xFFFFFFFF` meaning -1.
#### there are multiple ways to represent negative integers
The way we just talked about of representing negative integers (“its the equivalent positive integer, but you subtract 2^n”) is called
**twos complement**, and its the most common on modern computers. There are several other ways
though, the [wikipedia article has a list][14].
The way we just talked about of representing negative integers (“its the equivalent positive integer, but you subtract 2^n”) is called **twos complement**, and its the most common on modern computers. There are several other ways though, the [wikipedia article has a list][14].
#### weird thing: the absolute value of -128 is negative
@ -182,16 +175,13 @@ This prints out:
-128
```
This is because the signed 8-bit integers go from -128 to 127 there **is** no +128!
Some programs might crash when you try to do this (its an overflow), but Go
doesnt.
This is because the signed 8-bit integers go from -128 to 127 there **is** no +128! Some programs might crash when you try to do this (its an overflow), but Go doesnt.
Now that weve talked about signed integers a bunch, lets dig into another example of how they can cause problems.
#### example 3: decoding a binary format in Java
Lets say youre parsing a binary format in Java, and you want to get the first
4 bits of the byte `0x90`. The correct answer is 9.
Lets say youre parsing a binary format in Java, and you want to get the first 4 bits of the byte `0x90`. The correct answer is 9.
```
public class Main {
@ -222,9 +212,7 @@ Lets break down what those two facts mean for our little calculation `b >> 4`
#### what can you do about it?
I dont the actual idiomatic way to do this in Java is, but the way Id naively
approach fixing this is to put in a bit mask before doing the right shift. So
instead of:
I dont the actual idiomatic way to do this in Java is, but the way Id naively approach fixing this is to put in a bit mask before doing the right shift. So instead of:
```
b >> 4
@ -238,20 +226,15 @@ wed write
`b & 0xFF` seems redundant (`b` is already a byte!), but its actually not because `b` is being promoted to an integer.
Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> 0x00000009`, which is the result we wanted: 9.
Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> x00000009`, which is the result we wanted: 9.
And when we actually try it, it prints out “9”.
Also, if we were using a language with unsigned integers, the natural way to
deal with this would be to treat the value as an unsigned integer in the first
place. But thats not possible in Java.
Also, if we were using a language with unsigned integers, the natural way to deal with this would be to treat the value as an unsigned integer in the first place. But thats not possible in Java.
#### example 4: misinterpreting an IP address or string as an integer
I dont know if this is technically a “problem with integers” but its funny
so Ill mention it: [Rachel by the bay][16] has a bunch of great
examples of things that are not integers being interpreted as integers. For
example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
I dont know if this is technically a “problem with integers” but its funny so Ill mention it: [Rachel by the bay][16] has a bunch of great examples of things that are not integers being interpreted as integers. For example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
She points out that you can actually ping any integer, and itll convert that integer into an IP address, for example:
@ -266,8 +249,7 @@ PING 132848123841239999988888888888234234234234234234 (251.164.101.122): 56 data
#### example 5: security problems because of integer overflow
Another integer overflow example: heres a [search for CVEs involving integer overflows][17].
There are a lot! Im not a security person, but heres one random example: this [json parsing library bug][18]
Another integer overflow example: heres a [search for CVEs involving integer overflows][17]. There are a lot! Im not a security person, but heres one random example: this [json parsing library bug][18]
My understanding of that json parsing bug is roughly:
@ -276,40 +258,25 @@ My understanding of that json parsing bug is roughly:
- but the JSON file is still 3GB, so it gets copied into the tiny buffer with almost 0 bytes of memory
- this overwrites all kinds of other memory that its not supposed to
The CVE says “This vulnerability mostly impacts process availability”, which I
think means “the program crashes”, but sometimes this kind of thing is much
worse and can result in arbitrary code execution.
The CVE says “This vulnerability mostly impacts process availability”, which I think means “the program crashes”, but sometimes this kind of thing is much worse and can result in arbitrary code execution.
My impression is that there are a large variety of different flavours of
security vulnerabilities caused by integer overflows.
My impression is that there are a large variety of different flavours of security vulnerabilities caused by integer overflows.
#### example 6: the case of the mystery byte order
One person said that theyre do scientific computing and sometimes they need to
read files which contain data with an unknown byte order.
One person said that theyre do scientific computing and sometimes they need to read files which contain data with an unknown byte order.
Lets invent a small example of this: say youre reading a file which contains 4
bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know
represent a 4-byte integer. There are 2 ways to interpret that integer:
Lets invent a small example of this: say youre reading a file which contains 4 bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know represent a 4-byte integer. There are 2 ways to interpret that integer:
- `0x00001281` (which translates to 4737). This order is called “big endian”
- `0x81120000` (which translates to 2165440512). This order is called “little endian”.
Which one is it? Well, maybe the file contains some metadata that specifies the
endianness. Or maybe you happen to know what machine it was generated on and
what byte order that machine uses. Or maybe you just read a bunch of values,
try both orders, and figure out which makes more sense. Maybe 2165440512 is too
big to make sense in the context of whatever your data is supposed to mean, or
maybe `4737` is too small.
Which one is it? Well, maybe the file contains some metadata that specifies the endianness. Or maybe you happen to know what machine it was generated on and what byte order that machine uses. Or maybe you just read a bunch of values, try both orders, and figure out which makes more sense. Maybe 2165440512 is too big to make sense in the context of whatever your data is supposed to mean, or maybe `4737` is too small.
A couple more notes on this:
- this isnt just a problem with integers, floating point numbers have byte
order too
- this also comes up when reading data from a network, but in that case the
byte order isnt a “mystery”, its just going to be big endian. But x86
machines (and many others) are little endian, so you have to swap the byte
order of all your numbers.
- this isnt just a problem with integers, floating point numbers have byte order too
- this also comes up when reading data from a network, but in that case the byte order isnt a “mystery”, its just going to be big endian. But x86 machines (and many others) are little endian, so you have to swap the byte order of all your numbers.
#### example 7: modulo of negative numbers
@ -317,17 +284,13 @@ This is more of a design decision about how different programming languages desi
Lets say you write `-13 % 3` in your program, or `13 % -3`. Whats the result?
It turns out that different programming languages do it differently, for
example in Python `-13 % 3 = 2` but in Javascript `-13 % 3 = -1`.
It turns out that different programming languages do it differently, for example in Python `-13 % 3 = 2` but in Javascript `-13 % 3 = -1`.
Theres a table in [this blog post][19] that
describes a bunch of different programming languages choices.
Theres a table in [this blog post][19] that describes a bunch of different programming languages choices.
#### example 8: compilers removing integer overflow checks
Weve been hearing a lot about integer overflow and why its bad. So lets
imagine you try to be safe and include some checks in your programs after
each addition, you make sure that the calculation didnt overflow. Like this:
Weve been hearing a lot about integer overflow and why its bad. So lets imagine you try to be safe and include some checks in your programs after each addition, you make sure that the calculation didnt overflow. Like this:
```
#include <stdio.h>
@ -356,39 +319,26 @@ $ gcc -O3 check_overflow.c -o check_overflow && ./check_overflow
0
```
Thats weird when we compile with `gcc`, we get the answer we expected, but
with `gcc -O3`, we get a different answer. Why?
Thats weird when we compile with `gcc`, we get the answer we expected, but with `gcc -O3`, we get a different answer. Why?
#### whats going on?
My understanding (which might be wrong) is:
- Signed integer overflow in C is **undefined behavior**. I think thats
because different C implementations might be using different representations
of signed integers (maybe theyre using ones complement instead of twos
complement or something)
- Signed integer overflow in C is **undefined behavior**. I think thats because different C implementations might be using different representations of signed integers (maybe theyre using ones complement instead of twos complement or something)
- “undefined behaviour” in C means “the compiler is free to do literally whatever it wants after that point” (see this post [With undefined behaviour, anything is possible][20] by Raph Levine for a lot more)
- Some compiler optimizations assume that undefined behaviour will never
happen. Theyre free to do this, because if that undefined behaviour
_did_ happen, then theyre allowed to do whatever they want, so “run the
code that I optimized assuming that this would never happen” is fine.
- So this `if (n + 100 < 0)` check is irrelevant if that did
happen, it would be undefined behaviour, so theres no need to execute the
contents of that if statement.
- Some compiler optimizations assume that undefined behaviour will never happen. Theyre free to do this, because if that undefined behaviour _did_ happen, then theyre allowed to do whatever they want, so “run the code that I optimized assuming that this would never happen” is fine.
- So this `if (n + 100 < 0)` check is irrelevant if that did happen, it would be undefined behaviour, so theres no need to execute the contents of that if statement.
So, thats weird. Im not going to write a “what can you do about it?” section here because Im pretty out of my depth already.
I certainly would not have expected that though.
My impression is that “undefined behaviour” is really a C/C++ concept, and
doesnt exist in other languages in the same way except in the case of “your
program called some C code in an incorrect way and that C code did something
weird because of undefined behaviour”. Which of course happens all the time.
My impression is that “undefined behaviour” is really a C/C++ concept, and doesnt exist in other languages in the same way except in the case of “your program called some C code in an incorrect way and that C code did something weird because of undefined behaviour”. Which of course happens all the time.
#### example 9: the && typo
This one was mentioned as a very upsetting bug. Lets say you have two integers
and you want to check that theyre both nonzero.
This one was mentioned as a very upsetting bug. Lets say you have two integers and you want to check that theyre both nonzero.
In Javascript, you might write:
@ -406,9 +356,7 @@ if a & b {
}
```
This is still perfectly valid code, but it means something completely different
its a bitwise and instead of a boolean and. Lets go into a Javascript
console and look at bitwise vs boolean and for `9` and `4`:
This is still perfectly valid code, but it means something completely different its a bitwise and instead of a boolean and. Lets go into a Javascript console and look at bitwise vs boolean and for `9` and `4`:
```
> 9 && 4
@ -421,20 +369,15 @@ console and look at bitwise vs boolean and for `9` and `4`:
4
```
Its easy to imagine this turning into a REALLY annoying bug since it would be
intermittent often `x & y` does turn out to be truthy if `x && y` is truthy.
Its easy to imagine this turning into a REALLY annoying bug since it would be intermittent often `x & y` does turn out to be truthy if `x && y` is truthy.
#### what to do about it?
For Javascript, ESLint has a [no-bitwise check][21] check), which
requires you manually flag “no, I actually know what Im doing, I want to do
bitwise and” if you use a bitwise and in your code. Im sure many other linters
have a similar check.
For Javascript, ESLint has a [no-bitwise check][21] check), which requires you manually flag “no, I actually know what Im doing, I want to do bitwise and” if you use a bitwise and in your code. Im sure many other linters have a similar check.
#### thats all for now!
There are definitely more problems with integers than this, but this got pretty
long again and Im tired of writing again so Im going to stop :)
There are definitely more problems with integers than this, but this got pretty long again and Im tired of writing again so Im going to stop :)
--------------------------------------------------------------------------------

View File

@ -10,26 +10,16 @@
Why does 0.1 + 0.2 = 0.30000000000000004?
======
Hello! I was trying to write about floating point yesterday,
and I found myself wondering about this calculation, with 64-bit floats:
Hello! I was trying to write about floating point yesterday, and I found myself wondering about this calculation, with 64-bit floats:
```
>>> 0.1 + 0.2
0.30000000000000004
```
I realized that I didnt understand exactly how it worked. I mean, I know
floating point calculations are inexact, and I know that you cant exactly
represent `0.1` in binary, but: theres a floating point number thats closer to
0.3 than `0.30000000000000004`! So why do we get the answer
`0.30000000000000004`?
I realized that I didnt understand exactly how it worked. I mean, I know floating point calculations are inexact, and I know that you cant exactly represent `0.1` in binary, but: theres a floating point number thats closer to 0.3 than `0.30000000000000004`! So why do we get the answer `0.30000000000000004`?
If you dont feel like reading this whole post with a bunch of calculations, the short answer is that
`0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125` lies exactly between
2 floating point numbers,
`0.299999999999999988897769753748434595763683319091796875` (usually printed as `0.3`) and
`0.3000000000000000444089209850062616169452667236328125` (usually printed as `0.30000000000000004`). The answer is
`0.30000000000000004` (the second one) because its significand is even.
If you dont feel like reading this whole post with a bunch of calculations, the short answer is that `0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125` lies exactly between 2 floating point numbers, `0.299999999999999988897769753748434595763683319091796875` (usually printed as `0.3`) and `0.3000000000000000444089209850062616169452667236328125` (usually printed as `0.30000000000000004`). The answer is `0.30000000000000004` (the second one) because its significand is even.
#### how floating point addition works
@ -38,9 +28,7 @@ This is roughly how floating point addition works:
- Add together the numbers (with extra precision)
- Round the result to the nearest floating point number
So lets use these rules to calculate 0.1 + 0.2. I just learned how floating
point addition works yesterday so its possible Ive made some mistakes in this
post, but I did get the answers I expected at the end.
So lets use these rules to calculate 0.1 + 0.2. I just learned how floating point addition works yesterday so its possible Ive made some mistakes in this post, but I did get the answers I expected at the end.
#### step 1: find out what 0.1 and 0.2 are
@ -53,9 +41,7 @@ First, lets use Python to figure out what the exact values of `0.1` and `0.2`
'0.20000000000000001110223024625156540423631668090820312500000000000000000000000000'
```
These really are the exact values: because floating point numbers are in base
2, you can represent them all exactly in base 10. You just need a lot of digits
sometimes :)
These really are the exact values: because floating point numbers are in base 2, you can represent them all exactly in base 10. You just need a lot of digits sometimes :)
#### step 2: add the numbers together
@ -79,8 +65,7 @@ Now, lets look at the floating point numbers around `0.3`. Heres the close
'0.29999999999999998889776975374843459576368331909179687500000000000000000000000000'
```
We can figure out the next floating point number after `0.3` by serializing
`0.3` to 8 bytes with `struct.pack`, adding 1, and then using `struct.unpack`:
We can figure out the next floating point number after `0.3` by serializing `0.3` to 8 bytes with `struct.pack`, adding 1, and then using `struct.unpack`:
```
>>> struct.pack("!d", 0.3)
@ -100,17 +85,13 @@ Apparently you can also do this with `math.nextafter`:
0.30000000000000004
```
So the two 64-bit floats around
`0.3` are
`0.299999999999999988897769753748434595763683319091796875` and
So the two 64-bit floats around `0.3` are `0.299999999999999988897769753748434595763683319091796875` and
`0.3000000000000000444089209850062616169452667236328125`
#### step 4: find out which one is closest to our result
It turns out that `0.3000000000000000166533453693773481063544750213623046875`
is exactly in the middle of
`0.299999999999999988897769753748434595763683319091796875` and
`0.3000000000000000444089209850062616169452667236328125`.
It turns out that `0.3000000000000000166533453693773481063544750213623046875` is exactly in the middle of
`0.299999999999999988897769753748434595763683319091796875` and `0.3000000000000000444089209850062616169452667236328125`.
You can see that with this calculation:
@ -123,10 +104,7 @@ So neither of them is closest.
#### how does it know which one to round to?
In the binary representation of a floating point number, theres a number
called the “significand”. In cases like this (where the result is exactly in
between 2 successive floating point number, itll round to the one with the
even significand.
In the binary representation of a floating point number, theres a number called the “significand”. In cases like this (where the result is exactly in between 2 successive floating point number, itll round to the one with the even significand.
In this case thats `0.300000000000000044408920985006261616945266723632812500`
@ -135,20 +113,13 @@ We actually saw the significand of this number a bit earlier:
- 0.30000000000000004 is `struct.unpack('!d', b'?\xd3333334')`
- 0.3 is `struct.unpack('!d', b'?\xd3333333')`
The last digit of the big endian hex representation of `0.30000000000000004` is
`4`, so thats the one with the even significand (because the significand is at
the end).
The last digit of the big endian hex representation of `0.30000000000000004` is `4`, so thats the one with the even significand (because the significand is at the end).
#### lets also work out the whole calculation in binary
Above we did the calculation in decimal, because thats a little more intuitive
to read. But of course computers dont do these calculations in decimal
theyre done in a base 2 representation. So I wanted to get an idea of how that
worked too.
Above we did the calculation in decimal, because thats a little more intuitive to read. But of course computers dont do these calculations in decimal theyre done in a base 2 representation. So I wanted to get an idea of how that worked too.
I dont think this binary calculation part of the post is particularly clear
but it was helpful for me to write out. There are a really a lot of numbers and
it might be terrible to read.
I dont think this binary calculation part of the post is particularly clear but it was helpful for me to write out. There are a really a lot of numbers and it might be terrible to read.
#### how 64-bit floats numbers work: exponent and significand
@ -181,11 +152,9 @@ def get_significand(f):
return x ^ (exponent << 52)
```
Im ignoring the sign bit (the first bit) because we only need these functions
to work on two numbers (0.1 and 0.2) and those two numbers are both positive.
Im ignoring the sign bit (the first bit) because we only need these functions to work on two numbers (0.1 and 0.2) and those two numbers are both positive.
First, lets get the exponent and significand of 0.1. We need to subtract 1023
to get the actual exponent because thats how floating point works.
First, lets get the exponent and significand of 0.1. We need to subtract 1023 to get the actual exponent because thats how floating point works.
```
>>> get_exponent(0.1) - 1023
@ -203,9 +172,7 @@ Heres that calculation in Python:
0.1
```
(you might legitimately be worried about floating point accuracy issues with
this calculation, but in this case Im pretty sure its fine because these
numbers by definition dont have accuracy issues the floating point numbers starting at `2**-4` go up in steps of `1/2**(52 + 4)`)
(you might legitimately be worried about floating point accuracy issues with this calculation, but in this case Im pretty sure its fine because these numbers by definition dont have accuracy issues the floating point numbers starting at `2**-4` go up in steps of `1/2**(52 + 4)`)
We can do the same thing for `0.2`:
@ -309,10 +276,7 @@ Thats the answer we expected:
#### this probably isnt exactly how it works in hardware
The way Ive described the operations here isnt literally exactly
what happens when you do floating point addition (its not “solving for X” for
example), Im sure there are a lot of efficient tricks. But I think its about
the same idea.
The way Ive described the operations here isnt literally exactly what happens when you do floating point addition (its not “solving for X” for example), Im sure there are a lot of efficient tricks. But I think its about the same idea.
#### printing out floating point numbers is pretty weird
@ -325,48 +289,31 @@ We said earlier that the floating point number 0.3 isnt equal to 0.3. Its
So when you print out that number, why does it display `0.3`?
The computer isnt actually printing out the exact value of the number, instead
its printing out the _shortest_ decimal number `d` which has the property that
our floating point number `f` is the closest floating point number to `d`.
The computer isnt actually printing out the exact value of the number, instead its printing out the _shortest_ decimal number `d` which has the property that our floating point number `f` is the closest floating point number to `d`.
It turns out that doing this efficiently isnt trivial at all, and there are a bunch of academic papers about it like [Printing Floating-Point Numbers Quickly and Accurately][1]. or [How to print floating point numbers accurately][2].
#### would it be more intuitive if computers printed out the exact value of a float?
Rounding to a nice clean decimal value is nice, but in a way I feel like it
might be more intuitive if computers just printed out the exact value of a
floating point number it might make it seem a lot less surprising when you
get weird results.
Rounding to a nice clean decimal value is nice, but in a way I feel like it might be more intuitive if computers just printed out the exact value of a floating point number it might make it seem a lot less surprising when you get weird results.
To me,
0.1000000000000000055511151231257827021181583404541015625 +
0.200000000000000011102230246251565404236316680908203125
= 0.3000000000000000444089209850062616169452667236328125 feels less surprising than 0.1 + 0.2 = 0.30000000000000004.
To me, 0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125 = 0.3000000000000000444089209850062616169452667236328125 feels less surprising than 0.1 + 0.2 = 0.30000000000000004.
Probably this is a bad idea, it would definitely use a lot of screen space.
#### a quick note on PHP
Someone in the comments somewhere pointed out that `<?php echo (0.1 + 0.2 );?>`
prints out `0.3`. Does that mean that floating point math is different in PHP?
Someone in the comments somewhere pointed out that `<?php echo (0.1 + 0.2 );?>` prints out `0.3`. Does that mean that floating point math is different in PHP?
I think the answer is no if I run:
`<?php echo (0.1 + 0.2 )- 0.3);?>` on [this
page][3], I get the exact same answer as in
Python 5.5511151231258E-17. So it seems like the underlying floating point
math is the same.
`<?php echo (0.1 + 0.2 )- 0.3);?>` on [this page][3], I get the exact same answer as in Python 5.5511151231258E-17. So it seems like the underlying floating point math is the same.
I think the reason that `0.1 + 0.2` prints out `0.3` in PHP is that PHPs
algorithm for displaying floating point numbers is less precise than Pythons
itll display `0.3` even if that number isnt the closest floating point
number to 0.3.
I think the reason that `0.1 + 0.2` prints out `0.3` in PHP is that PHPs algorithm for displaying floating point numbers is less precise than Pythons itll display `0.3` even if that number isnt the closest floating point number to 0.3.
#### thats all!
I kind of doubt that anyone had the patience to follow all of that arithmetic,
but it was helpful for me to write down, so Im publishing this post anyway.
Hopefully some of this makes sense.
I kind of doubt that anyone had the patience to follow all of that arithmetic, but it was helpful for me to write down, so Im publishing this post anyway. Hopefully some of this makes sense.
--------------------------------------------------------------------------------
@ -383,4 +330,4 @@ via: https://jvns.ca/blog/2023/02/08/why-does-0-1-plus-0-2-equal-0-3000000000000
[b]: https://github.com/lkxed/
[1]: https://legacy.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf
[2]: https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf
[3]: https://replit.com/languages/php_cli
[3]: https://replit.com/languages/php_cli

View File

@ -10,33 +10,19 @@
Some notes on using nix
======
Recently I started using a Mac for the first time. The biggest downside Ive
noticed so far is that the package management is much worse than on Linux.
At some point I got frustrated with homebrew because I felt like it was
spending too much time upgrading when I installed new packages, and so I
thought maybe Ill try the [nix][1] package manager!
Recently I started using a Mac for the first time. The biggest downside Ive noticed so far is that the package management is much worse than on Linux. At some point I got frustrated with homebrew because I felt like it was spending too much time upgrading when I installed new packages, and so I thought maybe Ill try the [nix][1] package manager!
nix has a reputation for being confusing (it has its whole
own programming language!), so Ive been trying to figure out how to use nix in
a way thats as simple as possible and does not involve managing any
configuration files or learning a new programming language. Heres what Ive
figured out so far! Well talk about how to:
nix has a reputation for being confusing (it has its whole own programming language!), so Ive been trying to figure out how to use nix in a way thats as simple as possible and does not involve managing any configuration files or learning a new programming language. Heres what Ive figured out so far! Well talk about how to:
- install packages with nix
- build a custom nix package for a C++ program called [paperjam][2]
- install a 5-year-old version of [hugo][3] with nix
As usual Ive probably gotten some stuff wrong in this post since Im still
pretty new to nix. Im also still not sure how much I like nix its very
confusing! But its helped me compile some software that I was struggling to
compile otherwise, and in general it seems to install things faster than
homebrew.
As usual Ive probably gotten some stuff wrong in this post since Im still pretty new to nix. Im also still not sure how much I like nix its very confusing! But its helped me compile some software that I was struggling to compile otherwise, and in general it seems to install things faster than homebrew.
#### whats interesting about nix?
People often describe nix as “declarative package management”. I dont
care that much about declarative package management, so here are two things
that I appreciate about nix:
People often describe nix as “declarative package management”. I dont care that much about declarative package management, so here are two things that I appreciate about nix:
- It provides binary packages (hosted at [https://cache.nixos.org/][4]) that you can quickly download and install
- For packages which dont have binary packages, it makes it easier to compile them
@ -44,12 +30,8 @@ that I appreciate about nix:
I think that the reason nix is good at compiling software is that:
- you can have multiple versions of the same library or program installed at a time (you could have 2 different versions of libc for instance). For example I have two versions of node on my computer right now, one at `/nix/store/4ykq0lpvmskdlhrvz1j3kwslgc6c7pnv-nodejs-16.17.1` and one at `/nix/store/5y4bd2r99zhdbir95w5pf51bwfg37bwa-nodejs-18.9.1`.
- when nix builds a package, it builds it in isolation, using only the
specific versions of its dependencies that you explicitly declared. So
theres no risk that the package secretly depends on another package on your
system that you dont know about. No more fighting with `LD_LIBRARY_PATH`!
- a lot of people have put a lot of work into writing down all of the
dependencies of packages
- when nix builds a package, it builds it in isolation, using only the specific versions of its dependencies that you explicitly declared. So theres no risk that the package secretly depends on another package on your
system that you dont know about. No more fighting with `LD_LIBRARY_PATH`! - a lot of people have put a lot of work into writing down all of the dependencies of packages
Ill give a couple of examples later in this post of two times nix made it easier for me to compile software.
@ -72,15 +54,11 @@ nix-env -iA nixpkgs.fish
This seems to just download some binaries from [https://cache.nixos.org][8] pretty simple.
Some people use nix to install their Node and Python and Ruby packages, but I havent
been doing that I just use `npm install` and `pip install` the same way I
always have.
Some people use nix to install their Node and Python and Ruby packages, but I havent been doing that I just use `npm install` and `pip install` the same way I always have.
#### some nix features Im not using
There are a bunch of nix features/tools that Im not using, but that Ill
mention. I originally thought that you _had_ to use these features to use nix,
because most of the nix tutorials Ive read talk about them. But you dont have to use them.
There are a bunch of nix features/tools that Im not using, but that Ill mention. I originally thought that you _had_ to use these features to use nix, because most of the nix tutorials Ive read talk about them. But you dont have to use them.
- NixOS (a Linux distribution)
- [nix-shell][9]
@ -88,8 +66,7 @@ because most of the nix tutorials Ive read talk about them. But you dont h
- [home-manager][11]
- [devenv.sh][12]
I wont go into these because I havent really used them and there are lots of
explanations out there.
I wont go into these because I havent really used them and there are lots of explanations out there.
#### where are nix packages defined?
@ -107,16 +84,14 @@ I found a way to search nix packages from the command line that I liked better:
#### everything is installed with symlinks
One of nixs major design choices is that there isnt one single `bin` with all
your packages, instead you use symlinks. There are a lot of layers of symlinks. A few examples of symlinks:
One of nixs major design choices is that there isnt one single `bin` with all your packages, instead you use symlinks. There are a lot of layers of symlinks. A few examples of symlinks:
- `~/.nix-profile` on my machine is (indirectly) a symlink to `/nix/var/nix/profiles/per-user/bork/profile-111-link/`
- `~/.nix-profile/bin/fish` is a symlink to `/nix/store/afkwn6k8p8g97jiqgx9nd26503s35mgi-fish-3.5.1/bin/fish`
When I install something, it creates a new `profile-112-link` directory with new symlinks and updates my `~/.nix-profile` to point to that directory.
I think this means that if I install a new version of `fish` and I dont like it, I can
easily go back just by running `nix-env --rollback` itll move me to my previous profile directory.
I think this means that if I install a new version of `fish` and I dont like it, I can easily go back just by running `nix-env --rollback` itll move me to my previous profile directory.
#### uninstalling packages doesnt delete them
@ -161,28 +136,19 @@ I havent really upgraded anything yet. I think that if something goes wrong w
nix-env --rollback
```
Someone linked me to [this post from Ian Henry][15] that
talks about some confusing problems with `nix-env --upgrade` maybe it
doesnt work the way youd expect? I guess Ill be wary around upgrades.
Someone linked me to [this post from Ian Henry][15] that talks about some confusing problems with `nix-env --upgrade` maybe it doesnt work the way youd expect? I guess Ill be wary around upgrades.
#### next goal: make a custom package of paperjam
After a few months of installing existing packages, I wanted to make a custom package with nix for a program called [paperjam][2] that wasnt already packaged.
I was actually struggling to compile `paperjam` at all even without nix because the version I had
of `libiconv` I has on my system was wrong. I thought it might be easier to
compile it with nix even though I didnt know how to make nix packages yet. And
it actually was!
I was actually struggling to compile `paperjam` at all even without nix because the version I had of `libiconv` I has on my system was wrong. I thought it might be easier to compile it with nix even though I didnt know how to make nix packages yet. And it actually was!
But figuring out how to get there was VERY confusing, so here are some notes about how I did it.
#### how to build an example package
Before I started working on my `paperjam` package, I wanted to build an example existing package just to
make sure I understood the process for building a package. I was really
struggling to figure out how to do this, but I asked in Discord and someone
explained to me how I could get a working package from [https://github.com/NixOS/nixpkgs/][13] and build it. So here
are those instructions:
Before I started working on my `paperjam` package, I wanted to build an example existing package just to make sure I understood the process for building a package. I was really struggling to figure out how to do this, but I asked in Discord and someone explained to me how I could get a working package from [https://github.com/NixOS/nixpkgs/][13] and build it. So here are those instructions:
**step 1:** Download some arbitrary package from [nixpkgs][13] on github, for example the `dash` package:
@ -190,8 +156,7 @@ are those instructions:
wget https://raw.githubusercontent.com/NixOS/nixpkgs/47993510dcb7713a29591517cb6ce682cc40f0ca/pkgs/shells/dash/default.nix -O dash.nix
```
**step 2**: Replace the first statement (`{ lib , stdenv , buildPackages , autoreconfHook , pkg-config , fetchurl , fetchpatch , libedit , runCommand , dash }:` with `with import <nixpkgs> {};` I dont know why you have to do this,
but it works.
**step 2**: Replace the first statement (`{ lib , stdenv , buildPackages , autoreconfHook , pkg-config , fetchurl , fetchpatch , libedit , runCommand , dash }:` with `with import <nixpkgs> {};` I dont know why you have to do this, but it works.
**step 3**: Run `nix-build dash.nix`
@ -207,11 +172,7 @@ Thats all! Once Id done that, I felt like I could modify the `dash` packag
`paperjam` has one dependency (`libpaper`) that also isnt packaged yet, so I needed to build `libpaper` first.
Heres `libpaper.nix`. I basically just wrote this by copying and pasting from
other packages in the [nixpkgs][13] repository.
My guess is whats happening here is that nix has some default rules for
compiling C packages (like “run `make install`”), so the `make install` happens
default and I dont need to configure it explicitly.
Heres `libpaper.nix`. I basically just wrote this by copying and pasting from other packages in the [nixpkgs][13] repository. My guess is whats happening here is that nix has some default rules for compiling C packages (like “run `make install`”), so the `make install` happens default and I dont need to configure it explicitly.
```
with import <nixpkgs> {};
@ -249,10 +210,7 @@ Next, I needed to compile `paperjam`. Heres a link to the [nix package I wrot
I set the hashes by first leaving the hash empty, then running `nix-build` to get an error message complaining about a mismatched hash. Then I copied the correct hash out of the error message.
I figured out how to set `installFlags` just by running `rg PREFIX`
in the nixpkgs repository I figured that needing to set a `PREFIX` was
pretty common and someone had probably done it before, and I was right. So I
just copied and pasted that line from another package.
I figured out how to set `installFlags` just by running `rg PREFIX` in the nixpkgs repository I figured that needing to set a `PREFIX` was pretty common and someone had probably done it before, and I was right. So I just copied and pasted that line from another package.
Then I ran:
@ -265,29 +223,17 @@ and then everything worked and I had `paperjam` installed! Hooray!
#### next goal: install a 5-year-old version of hugo
Right now I build this blog using Hugo 0.40, from 2018. I dont need any new
features so I havent felt a need to upgrade. On Linux this is easy: Hugos
releases are a static binary, so I can just download the 5-year-old binary from
the [releases page][17] and
run it. Easy!
Right now I build this blog using Hugo 0.40, from 2018. I dont need any new features so I havent felt a need to upgrade. On Linux this is easy: Hugos releases are a static binary, so I can just download the 5-year-old binary from the [releases page][17] and run it. Easy!
But on this Mac I ran into some complications. Mac hardware has changed in the
last 5 years, so the Mac Hugo binary I downloaded crashed. And when I tried to
build it from source with `go build`, that didnt work either because Go build
norms have changed in the last 5 years as well.
But on this Mac I ran into some complications. Mac hardware has changed in the last 5 years, so the Mac Hugo binary I downloaded crashed. And when I tried to build it from source with `go build`, that didnt work either because Go build norms have changed in the last 5 years as well.
I was working around this by running Hugo in a Linux docker container, but I
didnt love that: it was kind of slow and it felt silly. It shouldnt be that
hard to compile one Go program!
I was working around this by running Hugo in a Linux docker container, but I didnt love that: it was kind of slow and it felt silly. It shouldnt be that hard to compile one Go program!
Nix to the rescue! Heres what I did to install the old version of Hugo with
nix.
Nix to the rescue! Heres what I did to install the old version of Hugo with nix.
#### installing Hugo 0.40 with nix
I wanted to install Hugo 0.40 and put it in my PATH as `hugo-0.40`. Heres how
I did it. I did this in a kind of weird way, but it worked ([Searching and installing old versions of Nix packages][18]
describes a probably more normal method).
I wanted to install Hugo 0.40 and put it in my PATH as `hugo-0.40`. Heres how I did it. I did this in a kind of weird way, but it worked ([Searching and installing old versions of Nix packages][18] describes a probably more normal method).
**step 1**: Search through the nixpkgs repo to find Hugo 0.40
@ -318,33 +264,19 @@ I figured out how to run this by running `rg 'mv '` in the nixpkgs repository an
I installed into my `~/.nix-profile/bin` by running `nix-env -i -f hugo.nix`.
And it all works! I put the final `.nix` file into my own personal [nixpkgs repo][20] so that I can use it again later if I
want.
And it all works! I put the final `.nix` file into my own personal [nixpkgs repo][20] so that I can use it again later if I want.
#### reproducible builds arent magic, theyre really hard
I think its worth noting here that this `hugo.nix` file isnt magic the
reason I can easily compile Hugo 0.40 today is that many people worked for a long time to make it possible to
package that version of Hugo in a reproducible way.
I think its worth noting here that this `hugo.nix` file isnt magic the reason I can easily compile Hugo 0.40 today is that many people worked for a long time to make it possible to package that version of Hugo in a reproducible way.
#### thats all!
Installing `paperjam` and this 5-year-old version of Hugo were both
surprisingly painless and actually much easier than compiling it without nix,
because nix made it much easier for me to compile the `paperjam` package with
the right version of `libiconv`, and because someone 5 years ago had already
gone to the trouble of listing out the exact dependencies for Hugo.
Installing `paperjam` and this 5-year-old version of Hugo were both surprisingly painless and actually much easier than compiling it without nix, because nix made it much easier for me to compile the `paperjam` package with the right version of `libiconv`, and because someone 5 years ago had already gone to the trouble of listing out the exact dependencies for Hugo.
I dont have any plans to get much more complicated with nix (and its still
very possible Ill get frustrated with it and go back to homebrew!), but well
see what happens! Ive found it much easier to start in a simple way and then
start using more features if I feel the need instead of adopting a whole bunch
of complicated stuff all at once.
I dont have any plans to get much more complicated with nix (and its still very possible Ill get frustrated with it and go back to homebrew!), but well see what happens! Ive found it much easier to start in a simple way and then start using more features if I feel the need instead of adopting a whole bunch of complicated stuff all at once.
I probably wont use nix on Linux Ive always been happy enough with `apt`
(on Debian-based distros) and `pacman` (on Arch-based distros), and theyre
much less confusing. But on a Mac it seems like it might be worth it. Well
see! Its very possible in 3 months Ill get frustrated with nix and just go back to homebrew.
I probably wont use nix on Linux Ive always been happy enough with `apt` (on Debian-based distros) and `pacman` (on Arch-based distros), and theyre much less confusing. But on a Mac it seems like it might be worth it. Well see! Its very possible in 3 months Ill get frustrated with nix and just go back to homebrew.
--------------------------------------------------------------------------------

View File

@ -10,8 +10,7 @@
How do Nix builds work?
======
Hello! For some reason after the last [nix post][1] I got nerdsniped by trying to understand how Nix builds
work under the hood, so heres a quick exploration I did today. There are probably some mistakes in here.
Hello! For some reason after the last [nix post][1] I got nerdsniped by trying to understand how Nix builds work under the hood, so heres a quick exploration I did today. There are probably some mistakes in here.
I started by [complaining on Mastodon][2]:
@ -31,24 +30,18 @@ complicated C program.
#### the goal: compile a C program, without using Nixs standard machinery
Our goal is to compile a C program called `paperjam`. This is a real C program
that wasnt in the Nix repository already. I already figured out how to
compile it in [this post][1] by copying and pasting a bunch of stuff I didnt understand, but this time I wanted to do
it in a more principled way where I actually understand more of the steps.
Our goal is to compile a C program called `paperjam`. This is a real C program that wasnt in the Nix repository already. I already figured out how to
compile it in [this post][1] by copying and pasting a bunch of stuff I didnt understand, but this time I wanted to do it in a more principled way where I actually understand more of the steps.
Were going to avoid using most of Nixs helpers for compiling C programs.
The plan is to start with an almost empty build script, and then resolve errors
until we have a working build.
The plan is to start with an almost empty build script, and then resolve errors until we have a working build.
#### first: whats a derivation?
I said that we werent going to talk about too many Nix abstractions (and we wont!), but understanding what a derivation is really helped me.
Everything I read about Nix talks about derivations all the time, but I was
really struggling to figure out what a derivation _is_. It turns out that `derivation`
is a function in the Nix language. But not just any function! The whole point of the Nix language seems to be to
to call this function. The [official documentation for the `derivation` function][5] is actually extremely clear. Heres what I took away:
Everything I read about Nix talks about derivations all the time, but I was really struggling to figure out what a derivation _is_. It turns out that `derivation` is a function in the Nix language. But not just any function! The whole point of the Nix language seems to be to to call this function. The [official documentation for the `derivation` function][5] is actually extremely clear. Heres what I took away:
`derivation` takes a bunch of keys and values as input. There are 3 required keys:
@ -56,8 +49,7 @@ to call this function. The [official documentation for the `derivation` function
- `name`: the name of the package youre building
- `builder`: a program (usually a bash script) that runs the build
Every other key is an arbitrary string that gets passed as an environment
variable to the `builder` shell script.
Every other key is an arbitrary string that gets passed as an environment variable to the `builder` shell script.
#### derivations automatically build all their inputs
@ -69,15 +61,12 @@ Nix will:
- put the resulting output directory somewhere like `/nix/store/4garxzr1rpdfahf374i9p9fbxnx56519-qpdf-11.1.0`
- expand `pkgs.qpdf` into that output directory (as a string), so that I can reference it in my build script
The derivation function does some other things (described in the
[documentation][5]), but “it builds all of its inputs” is all we really need to know
The derivation function does some other things (described in the [documentation][5]), but “it builds all of its inputs” is all we really need to know
for now.
#### step 1: write a derivation file
Lets write a very simple build script and call the `derivation` function. These dont work yet,
but I found it pretty fun to go through all the errors, fix them one at a time,
and learn a little more about how Nix works by fixing them.
Lets write a very simple build script and call the `derivation` function. These dont work yet, but I found it pretty fun to go through all the errors, fix them one at a time, and learn a little more about how Nix works by fixing them.
Heres the build script (`build_paperjam.sh`). This just unpacks the tarball and runs `make install`.
@ -115,9 +104,7 @@ The main things here are:
#### problem 1: tar: command not found
Nix needs you to declare all the dependencies for your builds. It forces this
by removing your `PATH` environment variable so that you have no binaries in
your PATH at all.
Nix needs you to declare all the dependencies for your builds. It forces this by removing your `PATH` environment variable so that you have no binaries in your PATH at all.
This is pretty easy to fix: we just need to edit our `PATH`.
@ -150,11 +137,9 @@ The next error was:
> #include <qpdf/QPDF.hh>
```
Makes sense: everything is isolated, so it cant access my system header files.
Figuring out how to handle this was a little more confusing though.
Makes sense: everything is isolated, so it cant access my system header files. Figuring out how to handle this was a little more confusing though.
It turns out that the way Nix handles header files is that it has a shell
script wrapper around `clang`. So when you run `clang++`, youre actually
It turns out that the way Nix handles header files is that it has a shell script wrapper around `clang`. So when you run `clang++`, youre actually
running a shell script.
On my system, the `clang++` wrapper script was at `/nix/store/d929v59l9a3iakvjccqpfqckqa0vflyc-clang-wrapper-11.1.0/bin/clang++`. I searched that file for `LDFLAGS` and found that it uses 2 environment variables:
@ -194,22 +179,15 @@ Heres the next error:
I started by adding `-L ${pkgs.libiconv}/lib` to my `NIX_LDFLAGS` environment variable, but that didnt fix it. Then I spent a while going around in circles and being confused.
I eventually figured out how to fix this by taking a working version of the `paperjam` build that Id made before
and editing my `clang++` wrapper file to print out all of its environment
variables. The `LDFLAGS` environment variable in the working version was different from mine: it had `-liconv` in it.
I eventually figured out how to fix this by taking a working version of the `paperjam` build that Id made before and editing my `clang++` wrapper file to print out all of its environment variables. The `LDFLAGS` environment variable in the working version was different from mine: it had `-liconv` in it.
So I added `-liconv` to `NIX_LDFLAGS` as well and that fixed it.
#### why doesnt the original Makefile have -liconv?
I was a bit puzzled by this `-liconv` thing though: the original Makefile links
in `libqpdf` and `libpaper` by passing `-lqpdf -lpaper`. So why doesnt it link in iconv, if it requires the
iconv library?
I was a bit puzzled by this `-liconv` thing though: the original Makefile links in `libqpdf` and `libpaper` by passing `-lqpdf -lpaper`. So why doesnt it link in iconv, if it requires the iconv library?
I think the reason for this is that the original Makefile assumed that you were
running on Linux and using glibc, and glibc includes these iconv functions by
default. But I guess Mac OS libc doesnt include iconv, so we need to
explicitly set the linker flag `-liconv` to add the iconv library.
I think the reason for this is that the original Makefile assumed that you were running on Linux and using glibc, and glibc includes these iconv functions by default. But I guess Mac OS libc doesnt include iconv, so we need to explicitly set the linker flag `-liconv` to add the iconv library.
#### problem 6: missing codesign_allocate
@ -219,8 +197,7 @@ Time for the next error:
libc++abi: terminating with uncaught exception of type std::runtime_error: Failed to spawn codesign_allocate: No such file or directory
```
I guess this is some kind of Mac code signing thing. I used `find /nix/store -name codesign_allocate` to find `codesign_allocate` on my system. Its at
`/nix/store/a17dwfwqj5ry734zfv3k1f5n37s4wxns-cctools-binutils-darwin-973.0.1/bin/codesign_allocate`.
I guess this is some kind of Mac code signing thing. I used `find /nix/store -name codesign_allocate` to find `codesign_allocate` on my system. Its at `/nix/store/a17dwfwqj5ry734zfv3k1f5n37s4wxns-cctools-binutils-darwin-973.0.1/bin/codesign_allocate`.
But this doesnt tell us what the package is called we need to be able to refer to it as `${pkgs.XXXXXXX}` and `${pkgs.cctools-binutils-darwin}` doesnt work.
@ -289,8 +266,7 @@ make install PREFIX="$out"
#### lets look at our compiled derivation!
Now that we understand this configuration a little better, lets talk about
what `nix-build` is doing a little more.
Now that we understand this configuration a little better, lets talk about what `nix-build` is doing a little more.
Behind the scenes, `nix-build paperjam.nix` actually runs `nix-instantiate` and `nix-store --realize`:
@ -300,11 +276,7 @@ $ nix-instantiate paperjam.nix
$ nix-store --realize /nix/store/xp8kibpll55s0bm40wlpip51y7wnpfs0-paperjam-fake.drv
```
I think what this means is that `paperjam.nix` get compiled to some
intermediate representation (also called a derivation?), and then the Nix
runtime takes over and is in charge of actually running the build scripts.
We can look at this `.drv` intermediate representation with `nix show-derivation`
I think what this means is that `paperjam.nix` get compiled to some intermediate representation (also called a derivation?), and then the Nix runtime takes over and is in charge of actually running the build scripts. We can look at this `.drv` intermediate representation with `nix show-derivation`
```
{
@ -345,13 +317,11 @@ We can look at this `.drv` intermediate representation with `nix show-derivation
}
```
This feels surprisingly easy to understand you can see that there are a
bunch of environment variables, our bash script, and the paths to our inputs.
This feels surprisingly easy to understand you can see that there are a bunch of environment variables, our bash script, and the paths to our inputs.
#### the compilation helpers were not using: stdenv
Normally when you build a package with Nix, you dont do all of this stuff
yourself. Instead, you use a helper called `stdenv`, which seems to have two parts:
Normally when you build a package with Nix, you dont do all of this stuff yourself. Instead, you use a helper called `stdenv`, which seems to have two parts:
- a function called `stdenv.mkDerivation` which takes some arguments and generates a bunch of environment variables (it seems to be [documented here][6])
- a 1600-line bash build script ([setup.sh][7]) that consumes those environment variables. This is like our `build-paperjam.sh`, but much more generalized.
@ -370,8 +340,7 @@ and probably lots more useful things I dont know about yet
#### lets look at the derivation for jq
Lets look at one more compiled derivation, for `jq`. This is quite long but there
are some interesting things in here. I wanted to look at this because I wanted to see what a more typical derivation generated by `stdenv.mkDerivation` looked like.
Lets look at one more compiled derivation, for `jq`. This is quite long but there are some interesting things in here. I wanted to look at this because I wanted to see what a more typical derivation generated by `stdenv.mkDerivation` looked like.
```
$ nix show-derivation /nix/store/q9cw5rp0ibpl6h4i2qaq0vdjn4pyms3p-jq-1.6.drv
@ -451,8 +420,7 @@ $ nix show-derivation /nix/store/q9cw5rp0ibpl6h4i2qaq0vdjn4pyms3p-jq-1.6.drv
}
```
I thought it was interesting that some of the environment variables in here are actually bash scripts themselves for example the `postInstallCheck` environment variable is a bash script.
Those bash script environment variables are `eval`ed in the main bash script (you can [see that happening in setup.sh here][8])
I thought it was interesting that some of the environment variables in here are actually bash scripts themselves for example the `postInstallCheck` environment variable is a bash script. Those bash script environment variables are `eval`ed in the main bash script (you can [see that happening in setup.sh here][8])
The `postInstallCheck` environment variable in this particular derivation starts like this:
@ -469,11 +437,7 @@ All of my compiler experiments used about 3GB of disk space, but `nix-collect-ga
#### lets recap the process!
I feel like I understand Nix a bit better after going through this. I still
dont feel very motivated to learn the Nix language, but now I have some
idea of what Nix programs are actually doing under the hood!
My understanding is:
I feel like I understand Nix a bit better after going through this. I still dont feel very motivated to learn the Nix language, but now I have some idea of what Nix programs are actually doing under the hood! My understanding is:
- First, `.nix` files get compiled into a `.drv` file, which is mostly a bunch of inputs and outputs and environment variables. This is where the Nix language stops being relevant.
- Then all the environment variables get passed to a build script, which is in charge of doing the actual build

View File

@ -10,9 +10,7 @@
Some possible reasons for 8-bit bytes
======
Ive been working on a zine about how computers represent thing in binary, and
one question Ive gotten a few times is why does the x86 architecture use 8-bit bytes? Why not
some other size?
Ive been working on a zine about how computers represent thing in binary, and one question Ive gotten a few times is why does the x86 architecture use 8-bit bytes? Why not some other size?
With any question like this, I think there are two options:
@ -20,34 +18,18 @@ With any question like this, I think there are two options:
- 8 bits is objectively the Best Option for some reason, even if history had played out differently we would still use 8-bit bytes
- some mix of 1 & 2
Im not super into computer history (I like to use computers a lot more than I
like reading about them), but I am always curious if theres an essential
reason for why a computer thing is the way it is today, or whether its mostly
a historical accident. So were going to talk about some computer history.
Im not super into computer history (I like to use computers a lot more than I like reading about them), but I am always curious if theres an essential reason for why a computer thing is the way it is today, or whether its mostly a historical accident. So were going to talk about some computer history.
As an example of a historical accident: DNS has a `class` field which has 5
possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me thats
a clear example of a historical accident I cant imagine that wed define
the class field the same way if we could redesign DNS today without worrying about backwards compatibility. Im
not sure if wed use a class field at all!
As an example of a historical accident: DNS has a `class` field which has 5 possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me thats a clear example of a historical accident I cant imagine that wed define the class field the same way if we could redesign DNS today without worrying about backwards compatibility. Im not sure if wed use a class field at all!
There arent any definitive answers in this post, but I asked [on Mastodon][1] and
here are some potential reasons I found for the 8-bit byte. I think the answer
is some combination of these reasons.
There arent any definitive answers in this post, but I asked [on Mastodon][1] and here are some potential reasons I found for the 8-bit byte. I think the answer is some combination of these reasons.
#### whats the difference between a byte and a word?
First, this post talks about “bytes” and “words” a lot. Whats the difference between a byte and a word? My understanding is:
- the **byte size** is the smallest unit you can address. For example in a program on my machine `0x20aa87c68` might be the address of one byte, then `0x20aa87c69` is the address of the next byte.
- The **word size** is some multiple of the byte size. Ive been confused about
this for years, and the Wikipedia definition is incredibly vague (“a word is
the natural unit of data used by a particular processor design”). I
originally thought that the word size was the same as your register size (64
bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2],
on x86 a word is 16 bits even though the registers are 64 bits. So Im
confused is a word on x86 16 bits or 64 bits? Can it mean both, depending
on the context? Whats the deal?
- The **word size** is some multiple of the byte size. Ive been confused about this for years, and the Wikipedia definition is incredibly vague (“a word is the natural unit of data used by a particular processor design”). I originally thought that the word size was the same as your register size (64 bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2], on x86 a word is 16 bits even though the registers are 64 bits. So Im confused is a word on x86 16 bits or 64 bits? Can it mean both, depending on the context? Whats the deal?
Now lets talk about some possible reasons that we use 8-bit bytes!
@ -65,18 +47,11 @@ Heres a [video interview with Fred Brooks (who managed the project)][4] talki
> My most important technical decision in my IBM career was to go with the 8-bit byte for the 360.
> And on the basis of I believe character processing was going to become important as opposed to decimal digits.
It makes sense that an 8-bit byte would be better for text processing: 2^6 is
64, so 6 bits wouldnt be enough for lowercase letters, uppercase letters, and symbols.
It makes sense that an 8-bit byte would be better for text processing: 2^6 is 64, so 6 bits wouldnt be enough for lowercase letters, uppercase letters, and symbols.
To go with the 8-bit byte, System/360 also introduced the [EBCDIC][5] encoding, which is an 8-bit character encoding.
It looks like the next important machine in 8-bit-byte history was the
[Intel 8008][6], which was built to be
used in a computer terminal (the Datapoint 2200). Terminals need to be able to
represent letters as well as terminal control codes, so it makes sense for them
to use an 8-bit byte.
[This Datapoint 2200 manual from the Computer History Museum][7]
says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
It looks like the next important machine in 8-bit-byte history was the [Intel 8008][6], which was built to be used in a computer terminal (the Datapoint 2200). Terminals need to be able to represent letters as well as terminal control codes, so it makes sense for them to use an 8-bit byte. [This Datapoint 2200 manual from the Computer History Museum][7] says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
#### why was the 6-bit byte better for scientific computing?
@ -90,14 +65,11 @@ I was curious about this comment that the 6-bit byte would be better for scienti
> you to lose some of the information more rapidly than you would with binary
> shifting
I dont understand this comment at all why does the exponent have to be 8 bits
if you use a 32-bit word size? Why couldnt you use 9 bits or 10 bits if you
wanted? But its all I could find in a quick search.
I dont understand this comment at all why does the exponent have to be 8 bits if you use a 32-bit word size? Why couldnt you use 9 bits or 10 bits if you wanted? But its all I could find in a quick search.
#### why did mainframes use 36 bits?
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out
that theres a great explanation in the Wikipedia article on [36-bit computing][9]:
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out that theres a great explanation in the Wikipedia article on [36-bit computing][9]:
> Prior to the introduction of computers, the state of the art in precision
> scientific and engineering calculation was the ten-digit, electrically powered,
@ -111,23 +83,16 @@ that theres a great explanation in the Wikipedia article on [36-bit computing
So this 36 bit thing seems to based on the fact that log_2(20000000000) is 34.2. Huh.
My guess is that the reason for this is in the 50s, computers were
extremely expensive. So if you wanted your computer to support ten decimal
digits, youd design so that it had exactly enough bits to do that, and no
more.
My guess is that the reason for this is in the 50s, computers were extremely expensive. So if you wanted your computer to support ten decimal
digits, youd design so that it had exactly enough bits to do that, and no more.
Today computers are way faster and cheaper, so if you want to represent ten
decimal digits for some reason you can just use 64 bits wasting a little bit
of space is usually no big deal.
Today computers are way faster and cheaper, so if you want to represent ten decimal digits for some reason you can just use 64 bits wasting a little bit of space is usually no big deal.
Someone else mentioned that some of these machines with 36-bit word sizes let
you choose a byte size you could use 5 or 6 or 7 or 8-bit bytes, depending
on the context.
Someone else mentioned that some of these machines with 36-bit word sizes let you choose a byte size you could use 5 or 6 or 7 or 8-bit bytes, depending on the context.
#### reason 2: to work well with binary-coded decimal
In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that
encoded every decimal digit in 4 bits.
In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that encoded every decimal digit in 4 bits.
For example, if you wanted to encode the number 1234, in BCD that would be something like:
@ -135,49 +100,32 @@ For example, if you wanted to encode the number 1234, in BCD that would be somet
0001 0010 0011 0100
```
So if you want to be able to easily work with binary-coded decimal, your byte
size should be a multiple of 4 bits, like 8 bits!
So if you want to be able to easily work with binary-coded decimal, your byte size should be a multiple of 4 bits, like 8 bits!
#### why was BCD popular?
This integer representation seemed really weird to me why not just use
binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
This integer representation seemed really weird to me why not just use binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
My best guess about why is that early computers didnt have displays the same way we do
now, so the contents of a byte were mapped directly to on/off lights.
My best guess about why is that early computers didnt have displays the same way we do now, so the contents of a byte were mapped directly to on/off lights.
Heres a [picture from Wikipedia of an IBM 650 with some lights on its display][11] ([CC BY-SA 3.0][12]):
![][13]
So if you want people to be relatively able to easily read off a decimal number
from its binary representation, this makes a lot more sense. I think today BCD
is obsolete because we have displays and our computers can convert numbers
represented in binary to decimal for us and display them.
So if you want people to be relatively able to easily read off a decimal number from its binary representation, this makes a lot more sense. I think today BCD is obsolete because we have displays and our computers can convert numbers represented in binary to decimal for us and display them.
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from in
the context of BCD, you end up referring to half bytes a lot (because every
digits is 4 bits). So it makes sense to have a word for “4 bits”, and people
called 4 bits a nibble. Today “nibble” feels to me like an archaic term though
Ive definitely never used it except as a fun fact (its such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from in the context of BCD, you end up referring to half bytes a lot (because every digits is 4 bits). So it makes sense to have a word for “4 bits”, and people called 4 bits a nibble. Today “nibble” feels to me like an archaic term though Ive definitely never used it except as a fun fact (its such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
> The nibble is used to describe the amount of memory used to store a digit of
> a number stored in packed decimal format (BCD) within an IBM mainframe.
Another reason someone mentioned for BCD was **financial calculations**. Today
if you want to store a dollar amount, youll typically just use an integer
amount of cents, and then divide by 100 if you want the dollar part. This is no
big deal, division is fast. But apparently in the 70s dividing an integer
represented in binary by 100 was very slow, so it was worth it to redesign how
you represent your integers to avoid having to divide by 100.
Another reason someone mentioned for BCD was **financial calculations**. Today if you want to store a dollar amount, youll typically just use an integer amount of cents, and then divide by 100 if you want the dollar part. This is no big deal, division is fast. But apparently in the 70s dividing an integer represented in binary by 100 was very slow, so it was worth it to redesign how you represent your integers to avoid having to divide by 100.
Okay, enough about BCD.
#### reason 3: 8 is a power of 2?
A bunch of people said its important for a CPUs byte size to be a power of 2.
I cant figure out whether this is true or not though, and I wasnt satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper.
And historically there have definitely been lots of machines that used byte sizes that werent powers of 2, for example (from [this retro computing stack exchange thread][15]):
A bunch of people said its important for a CPUs byte size to be a power of 2. I cant figure out whether this is true or not though, and I wasnt satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper. And historically there have definitely been lots of machines that used byte sizes that werent powers of 2, for example (from [this retro computing stack exchange thread][15]):
- Cyber 180 mainframes used 6-bit bytes
- the Univac 1100 / 2200 series used a 36-bit word size
@ -190,57 +138,31 @@ Some reasons I heard for why powers of 2 are good that I havent understood ye
Reasons that made more sense to me:
- it makes it easier to design **clock dividers** that can measure “8 bits were
sent on this wire” that work based on halving you can put 3 halving clock
dividers in series. [Graham Sutherland][16] told me about this and made this really cool
[simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
- if you have an instruction that zeroes out a specific bit in a byte, then if
your byte size is 8 (2^3), you can use just 3 bits of your instruction to
indicate which bit. x86 doesnt seem to do this, but the [Z80s bit testing instructions][18] do.
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work
in groups of 4 bits. From some quick Googling it seems like there are a wide
variety of adder circuits out there though.
- **bitmaps**: Your computers memory is organized into pages (usually of size 2^n). It
needs to keep track of whether every page is free or not. Operating systems
use a bitmap to do this, where each bit corresponds to a page and is 0 or 1
depending on whether the page is free. If you had a 9-bit byte, you would
need to divide by 9 to find the page youre looking for in the bitmap.
Dividing by 9 is slower than dividing by 8, because dividing by powers of 2
is always the fastest thing.
- it makes it easier to design **clock dividers** that can measure “8 bits were sent on this wire” that work based on halving you can put 3 halving clock dividers in series. [Graham Sutherland][16] told me about this and made this really cool [simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
- if you have an instruction that zeroes out a specific bit in a byte, then if your byte size is 8 (2^3), you can use just 3 bits of your instruction to indicate which bit. x86 doesnt seem to do this, but the [Z80s bit testing instructions][18] do.
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work in groups of 4 bits. From some quick Googling it seems like there are a wide variety of adder circuits out there though.
- **bitmaps**: Your computers memory is organized into pages (usually of size 2^n). It needs to keep track of whether every page is free or not. Operating systems use a bitmap to do this, where each bit corresponds to a page and is 0 or 1 depending on whether the page is free. If you had a 9-bit byte, you would need to divide by 9 to find the page youre looking for in the bitmap. Dividing by 9 is slower than dividing by 8, because dividing by powers of 2 is always the fastest thing.
I probably mangled some of those explanations pretty badly: Im pretty far out
of my comfort zone here. Lets move on.
I probably mangled some of those explanations pretty badly: Im pretty far out of my comfort zone here. Lets move on.
#### reason 4: small byte sizes are good
You might be wondering well, if 8-bit bytes were better than 4-bit bytes,
why not keep increasing the byte size? We could have 16-bit bytes!
You might be wondering well, if 8-bit bytes were better than 4-bit bytes, why not keep increasing the byte size? We could have 16-bit bytes!
A couple of reasons to keep byte sizes small:
- Its a waste of space a byte is the minimum unit you can address, and if
your computer is storing a lot of ASCII text (which only needs 7 bits), it
would be a pretty big waste to dedicate 12 or 16 bits to each character when
you could use 8 bits instead.
- Its a waste of space a byte is the minimum unit you can address, and if your computer is storing a lot of ASCII text (which only needs 7 bits), it would be a pretty big waste to dedicate 12 or 16 bits to each character when you could use 8 bits instead.
- As bytes get bigger, your CPU needs to get more complex. For example you need one bus line per bit. So I guess simpler is better.
My understanding of CPU architecture is extremely shaky so Ill leave it at
that. The “its a waste of space” reason feels pretty compelling to me though.
My understanding of CPU architecture is extremely shaky so Ill leave it at that. The “its a waste of space” reason feels pretty compelling to me though.
#### reason 5: compatibility
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the
8086 (from 1976) the first x86 processor. It seems like the 8080 and the
8086 were really popular and thats where we get our modern x86 computers.
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the 8086 (from 1976) the first x86 processor. It seems like the 8080 and the 8086 were really popular and thats where we get our modern x86 computers.
I think theres an “if it aint broke dont fix it” thing going on here I
assume that 8-bit bytes were working well, so Intel saw no need to change the
design. If you keep the same 8-bit byte, then you can reuse more of your
instruction set.
I think theres an “if it aint broke dont fix it” thing going on here I assume that 8-bit bytes were working well, so Intel saw no need to change the design. If you keep the same 8-bit byte, then you can reuse more of your instruction set.
Also around the 80s we start getting network protocols like TCP
which use 8-bit bytes (usually called “octets”), and if youre going to be
implementing network protocols, you probably want to be using an 8-bit byte.
Also around the 80s we start getting network protocols like TCP which use 8-bit bytes (usually called “octets”), and if youre going to be implementing network protocols, you probably want to be using an 8-bit byte.
#### thats all!
@ -253,29 +175,15 @@ It seems to me like the main reasons for the 8-bit byte are:
- 8 is a better number than 7 (because its a power of 2)
- once you have popular 8-bit computers that are working well, you want to keep the same design for compatibility
Someone pointed out that [page 65 of this book from 1962][20]
talking about IBMs reasons to choose an 8-bit byte basically says the same thing:
Someone pointed out that [page 65 of this book from 1962][20] talking about IBMs reasons to choose an 8-bit byte basically says the same thing:
- Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
- Within the limits of this capacity, a single character is represented by a
single byte, so that the length of any particular record is not dependent on
the coincidence of characters in that record.
- 8-bit bytes are reasonably economical of storage space
- For purely numerical work, a decimal digit can be represented by only 4
bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such
packing of numerical data is not essential, it is a common practice in
order to increase speed and storage efficiency. Strictly speaking, 4-bit
bytes belong to a different code, but the simplicity of the 4-and-8-bit
scheme, as compared with a combination 4-and-6-bit scheme, for example,
leads to simpler machine design and cleaner addressing logic.
- Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer
to take advantage of powerful features of binary addressing and indexing to
the bit level (see Chaps. 4 and 5 ) .
> 1. Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
> 2. Within the limits of this capacity, a single character is represented by a single byte, so that the length of any particular record is not dependent on the coincidence of characters in that record.
> 3. 8-bit bytes are reasonably economical of storage space
> 4. For purely numerical work, a decimal digit can be represented by only 4 bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such packing of numerical data is not essential, it is a common practice in order to increase speed and storage efficiency. Strictly speaking, 4-bit bytes belong to a different code, but the simplicity of the 4-and-8-bit scheme, as compared with a combination 4-and-6-bit scheme, for example, leads to simpler machine design and cleaner addressing logic.
> 5. Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer to take advantage of powerful features of binary addressing and indexing to the bit level (see Chaps. 4 and 5 ) .
>
Overall this makes me feel like an 8-bit byte is a pretty natural choice if
youre designing a binary computer in an English-speaking country.
Overall this makes me feel like an 8-bit byte is a pretty natural choice if youre designing a binary computer in an English-speaking country.
--------------------------------------------------------------------------------