mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-02-03 23:40:14 +08:00
commit
bc1f6e90c3
@ -24,16 +24,20 @@ I’ve heard a million times about the dangers of floating point arithmetic, lik
|
||||
But I find all of this a little abstract on its own, and I really wanted some
|
||||
specific examples of floating point bugs in real-world programs.
|
||||
|
||||
So I [asked on Mastodon][1] for
|
||||
examples of how floating point has gone wrong for them in real programs, and as
|
||||
always folks delivered! Here are a bunch of examples. I’ve also written some
|
||||
example programs for some of them to see exactly what happens. Here’s a table of contents:
|
||||
So I [asked on Mastodon][1] for examples of how floating point has gone wrong for them in real programs, and as always folks delivered! Here are a bunch of examples. I’ve also written some example programs for some of them to see exactly what happens. Here’s a table of contents:
|
||||
|
||||
[how does floating point work?][2][floating point isn’t “bad” or random][3][example 1: the odometer that stopped][4][example 2: tweet IDs in Javascript][5][example 3: a variance calculation gone wrong][6][example 4: different languages sometimes do the same floating point calculation differently][7][example 5: the deep space kraken][8][example 6: the inaccurate timestamp][9][example 7: splitting a page into columns][10][example 8: collision checking][11]
|
||||
- [how does floating point work?][2]
|
||||
- [floating point isn’t “bad” or random][3]
|
||||
- [example 1: the odometer that stopped][4]
|
||||
- [example 2: tweet IDs in Javascript][5]
|
||||
- [example 3: a variance calculation gone wrong][6]
|
||||
- [example 4: different languages sometimes do the same floating point calculation differently][7]
|
||||
- [example 5: the deep space kraken][8]
|
||||
- [example 6: the inaccurate timestamp][9]
|
||||
- [example 7: splitting a page into columns][10]
|
||||
- [example 8: collision checking][11]
|
||||
|
||||
None of these 8 examples talk about NaNs or +0/-0 or infinity values or
|
||||
subnormals, but it’s not because those things don’t cause problems – it’s just
|
||||
that I got tired of writing at some point :).
|
||||
None of these 8 examples talk about NaNs or +0/-0 or infinity values or subnormals, but it’s not because those things don’t cause problems – it’s just that I got tired of writing at some point :).
|
||||
|
||||
Also I’ve probably made some mistakes in this post.
|
||||
|
||||
@ -45,35 +49,21 @@ I’m not going to write a long explanation of how floating point works in this
|
||||
|
||||
#### floating point isn’t “bad” or random
|
||||
|
||||
I don’t want you to read this post and conclude that floating point is bad.
|
||||
It’s an amazing tool for doing numerical calculations. So many smart people
|
||||
have done so much work to make numerical calculations on computers efficient and
|
||||
accurate! Two points about how all of this isn’t floating point’s fault:
|
||||
I don’t want you to read this post and conclude that floating point is bad. It’s an amazing tool for doing numerical calculations. So many smart people have done so much work to make numerical calculations on computers efficient and accurate! Two points about how all of this isn’t floating point’s fault:
|
||||
|
||||
- Doing numerical computations on a computer inherently involves
|
||||
some approximation and rounding, especially if you want to do it
|
||||
efficiently. You can’t always store an arbitrary amount of precision for
|
||||
- Doing numerical computations on a computer inherently involves some approximation and rounding, especially if you want to do it efficiently. You can’t always store an arbitrary amount of precision for
|
||||
every single number you’re working with.
|
||||
- Floating point is standardized (IEEE 754), so operations like addition on
|
||||
floating point numbers are deterministic – my understanding is that 0.1 +
|
||||
0.2 will always give you the exact same result (0.30000000000000004), even
|
||||
across different architectures. It might not be the result you _expected_,
|
||||
but it’s actually very predictable.
|
||||
- Floating point is standardized (IEEE 754), so operations like addition on floating point numbers are deterministic – my understanding is that 0.1 + 0.2 will always give you the exact same result (0.30000000000000004), even across different architectures. It might not be the result you _expected_, but it’s actually very predictable.
|
||||
|
||||
My goal for this post is just to explain what kind of problems can come up with
|
||||
floating point numbers and why they happen so that you know when to be
|
||||
careful with them, and when they’re not appropriate.
|
||||
My goal for this post is just to explain what kind of problems can come up with floating point numbers and why they happen so that you know when to be careful with them, and when they’re not appropriate.
|
||||
|
||||
Now let’s get into the examples.
|
||||
|
||||
#### example 1: the odometer that stopped
|
||||
|
||||
One person said that they were working on an odometer that was continuously
|
||||
adding small amounts to a 32-bit float to measure distance travelled, and
|
||||
things went very wrong.
|
||||
One person said that they were working on an odometer that was continuously adding small amounts to a 32-bit float to measure distance travelled, and things went very wrong.
|
||||
|
||||
To make this concrete, let’s say that we’re adding numbers to the odometer 1cm
|
||||
at a time. What does it look like after 10,000 kilometers?
|
||||
To make this concrete, let’s say that we’re adding numbers to the odometer 1cm at a time. What does it look like after 10,000 kilometers?
|
||||
|
||||
Here’s a C program that simulates that:
|
||||
|
||||
@ -101,10 +91,7 @@ This is VERY bad – it’s not a small error, 262km is a LOT less than 10,000km
|
||||
|
||||
#### what went wrong: gaps between floating point numbers get big
|
||||
|
||||
The problem in this case is that, for 32-bit floats, 262144.0 + 0.01 = 262144.0.
|
||||
So it’s not just that the number is inaccurate, it’ll actually never increase
|
||||
at all! If we travelled another 10,000 kilometers, the odometer would still be
|
||||
stuck at 262144 meters (aka 262.144km).
|
||||
The problem in this case is that, for 32-bit floats, 262144.0 + 0.01 = 262144.0. So it’s not just that the number is inaccurate, it’ll actually never increase at all! If we travelled another 10,000 kilometers, the odometer would still be stuck at 262144 meters (aka 262.144km).
|
||||
|
||||
Why is this happening? Well, floating point numbers get farther apart as they get bigger. In this example, for 32-bit floats, here are 3 consecutive floating point numbers:
|
||||
|
||||
@ -116,13 +103,9 @@ I got those numbers by going to [https://float.exposed/0x48800000][13] and incre
|
||||
|
||||
So, there are no 32-bit floating point numbers between 262144.0 and 262144.03125. Why is that a problem?
|
||||
|
||||
The problem is that 262144.03125 is about 262144.0 + 0.03. So when we try to
|
||||
add 0.01 to 262144.0, it doesn’t make sense to round up to the next number. So
|
||||
the sum just stays at 262144.0.
|
||||
The problem is that 262144.03125 is about 262144.0 + 0.03. So when we try to add 0.01 to 262144.0, it doesn’t make sense to round up to the next number. So the sum just stays at 262144.0.
|
||||
|
||||
Also, it’s not a coincidence that 262144 is a power of 2 (it’s 2^18). The gaps
|
||||
been floating point numbers change after every power of 2, and at 2^18 the gap
|
||||
between 32-bit floats is 0.03125, increasing from 0.016ish.
|
||||
Also, it’s not a coincidence that 262144 is a power of 2 (it’s 2^18). The gaps been floating point numbers change after every power of 2, and at 2^18 the gap between 32-bit floats is 0.03125, increasing from 0.016ish.
|
||||
|
||||
#### one way to solve this: use a double
|
||||
|
||||
@ -133,41 +116,26 @@ Expected: 10000.000000 km
|
||||
Got: 9999.999825 km
|
||||
```
|
||||
|
||||
There are still some small inaccuracies here – we’re off about 17 centimeters.
|
||||
Whether this matters or not depends on the context: being slightly off could very
|
||||
well be disastrous if we were doing a precision space maneuver or something, but
|
||||
it’s probably fine for an odometer.
|
||||
There are still some small inaccuracies here – we’re off about 17 centimeters. Whether this matters or not depends on the context: being slightly off could very well be disastrous if we were doing a precision space maneuver or something, but it’s probably fine for an odometer.
|
||||
|
||||
Another way to improve this would be to increment the odometer in bigger chunks
|
||||
– instead of adding 1cm at a time, maybe we could update it less frequently,
|
||||
like every 50cm.
|
||||
Another way to improve this would be to increment the odometer in bigger chunks – instead of adding 1cm at a time, maybe we could update it less frequently, like every 50cm.
|
||||
|
||||
If we use a double **and** increment by 50cm instead of 1cm, we get the exact
|
||||
correct answer:
|
||||
If we use a double **and** increment by 50cm instead of 1cm, we get the exact correct answer:
|
||||
|
||||
```
|
||||
Expected: 10000.000000 km
|
||||
Got: 10000.000000 km
|
||||
```
|
||||
|
||||
A third way to solve this could be to use an **integer**: maybe we decide that
|
||||
the smallest unit we care about is 0.1mm, and then measure everything as
|
||||
integer multiples of 0.1mm. I have never built an odometer so I can’t say what
|
||||
the best approach is.
|
||||
A third way to solve this could be to use an **integer**: maybe we decide that the smallest unit we care about is 0.1mm, and then measure everything as integer multiples of 0.1mm. I have never built an odometer so I can’t say what the best approach is.
|
||||
|
||||
#### example 2: tweet IDs in Javascript
|
||||
|
||||
Javascript only has floating point numbers – it doesn’t have an integer type.
|
||||
The biggest integer you can represent in a 64-bit floating point number is
|
||||
2^53.
|
||||
Javascript only has floating point numbers – it doesn’t have an integer type. The biggest integer you can represent in a 64-bit floating point number is 2^53.
|
||||
|
||||
But tweet IDs are big numbers, bigger than 2^53. The Twitter API now returns
|
||||
them as both integers and strings, so that in Javascript you can just use the
|
||||
string ID (like “1612850010110005250”), but if you tried to use the integer
|
||||
version in JS, things would go very wrong.
|
||||
But tweet IDs are big numbers, bigger than 2^53. The Twitter API now returns them as both integers and strings, so that in Javascript you can just use the string ID (like “1612850010110005250”), but if you tried to use the integer version in JS, things would go very wrong.
|
||||
|
||||
You can check this yourself by taking a tweet ID and putting it in the
|
||||
Javascript console, like this:
|
||||
You can check this yourself by taking a tweet ID and putting it in the Javascript console, like this:
|
||||
|
||||
```
|
||||
>> 1612850010110005250
|
||||
@ -176,8 +144,7 @@ Javascript console, like this:
|
||||
|
||||
Notice that 1612850010110005200 is NOT the same number as 1612850010110005250!! It’s 50 less!
|
||||
|
||||
This particular issue doesn’t happen in Python (or any other language that I
|
||||
know of), because Python has integers. Here’s what happens if we enter the same number in a Python REPL:
|
||||
This particular issue doesn’t happen in Python (or any other language that I know of), because Python has integers. Here’s what happens if we enter the same number in a Python REPL:
|
||||
|
||||
```
|
||||
In [3]: 1612850010110005250
|
||||
@ -188,14 +155,9 @@ Same number, as you’d expect.
|
||||
|
||||
#### example 2.1: the corrupted JSON data
|
||||
|
||||
This is a small variant of the “tweet IDs in Javascript” issue, but even if
|
||||
you’re _not_ actually writing Javascript code, numbers in JSON are still sometimes
|
||||
treated as if they’re floats. This mostly makes sense to me because JSON has
|
||||
“Javascript” in the name, so it seems reasonable to decode the values the way
|
||||
Javascript would.
|
||||
This is a small variant of the “tweet IDs in Javascript” issue, but even if you’re _not_ actually writing Javascript code, numbers in JSON are still sometimes treated as if they’re floats. This mostly makes sense to me because JSON has “Javascript” in the name, so it seems reasonable to decode the values the way Javascript would.
|
||||
|
||||
For example, if we pass some JSON through `jq`, we see the exact same issue:
|
||||
the number 1612850010110005250 gets changed into 1612850010110005200.
|
||||
For example, if we pass some JSON through `jq`, we see the exact same issue: the number 1612850010110005250 gets changed into 1612850010110005200.
|
||||
|
||||
```
|
||||
$ echo '{"id": 1612850010110005250}' | jq '.'
|
||||
@ -206,19 +168,13 @@ $ echo '{"id": 1612850010110005250}' | jq '.'
|
||||
|
||||
But it’s not consistent across all JSON libraries Python’s `json` module will decode `1612850010110005250` as the correct integer.
|
||||
|
||||
Several people mentioned issues with sending floats in JSON, whether either
|
||||
they were trying to send a large integer (like a pointer address) in JSON and
|
||||
it got corrupted, or sending smaller floating point values back and forth
|
||||
repeatedly and the value slowly diverging over time.
|
||||
Several people mentioned issues with sending floats in JSON, whether either they were trying to send a large integer (like a pointer address) in JSON and it got corrupted, or sending smaller floating point values back and forth repeatedly and the value slowly diverging over time.
|
||||
|
||||
#### example 3: a variance calculation gone wrong
|
||||
|
||||
Let’s say you’re doing some statistics, and you want to calculate the variance
|
||||
of many numbers. Maybe more numbers than you can easily fit in memory, so you
|
||||
want to do it in a single pass.
|
||||
Let’s say you’re doing some statistics, and you want to calculate the variance of many numbers. Maybe more numbers than you can easily fit in memory, so you want to do it in a single pass.
|
||||
|
||||
There’s a simple (but bad!!!) algorithm you can use to calculate the variance in a single pass,
|
||||
from [this blog post][14]. Here’s some Python code:
|
||||
There’s a simple (but bad!!!) algorithm you can use to calculate the variance in a single pass, from [this blog post][14]. Here’s some Python code:
|
||||
|
||||
```
|
||||
def calculate_bad_variance(nums):
|
||||
@ -246,7 +202,7 @@ Bad variance: 13.840000000000003 <- pretty close!
|
||||
Now, let’s try it the same 100,000 large numbers that are very close together (distributed between 100000000 and 100000000.06)
|
||||
|
||||
```
|
||||
In [7]: calculate_bad_variance(np.random.uniform(100000000, 100000000.06, 100000))
|
||||
In [7]: calculate_bad_variance(np.random.uniform(100000000, 100000000.06, 100000))
|
||||
Real variance: 0.00029959105209321173
|
||||
Bad variance: -138.93632 <- OH NO
|
||||
```
|
||||
@ -255,50 +211,27 @@ This is extremely bad: not only is the bad variance way off, it’s NEGATIVE! (t
|
||||
|
||||
#### what went wrong: catastrophic cancellation
|
||||
|
||||
What’s going here is similar to our odometer number problem: the
|
||||
`sum_of_squares` number gets extremely big (about 10^21 or 2^69), and at that point, the
|
||||
gap between consecutive floating point numbers is also very big – it’s 2**46.
|
||||
So we just lose all precision in our calculations.
|
||||
What’s going here is similar to our odometer number problem: the `sum_of_squares` number gets extremely big (about 10^21 or 2^69), and at that point, the gap between consecutive floating point numbers is also very big – it’s 2**46. So we just lose all precision in our calculations.
|
||||
|
||||
The term for this problem is “catastrophic cancellation” – we’re subtracting
|
||||
two very large floating point numbers which are both going to be pretty far
|
||||
from the correct value of the calculation, so the result of the subtraction is
|
||||
also going to be wrong.
|
||||
The term for this problem is “catastrophic cancellation” – we’re subtracting two very large floating point numbers which are both going to be pretty far from the correct value of the calculation, so the result of the subtraction is also going to be wrong. [The blog post I mentioned before][14]
|
||||
talks about a better algorithm people use to compute variance called Welford’s algorithm, which doesn’t have the catastrophic cancellation issue.
|
||||
|
||||
[The blog post I mentioned before][14]
|
||||
talks about a better algorithm people use to compute variance called
|
||||
Welford’s algorithm, which doesn’t have the catastrophic cancellation issue.
|
||||
|
||||
And of course, the solution for most people is to just use a scientific
|
||||
computing library like Numpy to calculate variance instead of trying to do it
|
||||
yourself :)
|
||||
And of course, the solution for most people is to just use a scientific computing library like Numpy to calculate variance instead of trying to do it yourself :)
|
||||
|
||||
#### example 4: different languages sometimes do the same floating point calculation differently
|
||||
|
||||
A bunch of people mentioned that different platforms will do the same
|
||||
calculation in different ways. One way this shows up in practice is – maybe
|
||||
you have some frontend code and some backend code that do the exact same
|
||||
floating point calculation. But it’s done slightly differently in Javascript
|
||||
and in PHP, so you users end up seeing discrepancies and getting confused.
|
||||
A bunch of people mentioned that different platforms will do the same calculation in different ways. One way this shows up in practice is – maybe you have some frontend code and some backend code that do the exact same floating point calculation. But it’s done slightly differently in Javascript and in PHP, so you users end up seeing discrepancies and getting confused.
|
||||
|
||||
In principle you might think that different implementations should work the
|
||||
same way because of the IEEE 754 standard for floating point, but here are a
|
||||
couple of caveats that were mentioned:
|
||||
In principle you might think that different implementations should work the same way because of the IEEE 754 standard for floating point, but here are a couple of caveats that were mentioned:
|
||||
|
||||
- math operations in libc (like sin/log) behave differently in different
|
||||
implementations. So code using glibc could give you different results than
|
||||
code using musl
|
||||
- some x86 instructions can use 80 bit precision for some double operations
|
||||
internally instead of 64 bit precision. [Here’s a GitHub issue talking about
|
||||
that][15]
|
||||
- math operations in libc (like sin/log) behave differently in different implementations. So code using glibc could give you different results than code using musl
|
||||
- some x86 instructions can use 80 bit precision for some double operations internally instead of 64 bit precision. [Here’s a GitHub issue talking about that][15]
|
||||
|
||||
I’m not very sure about these points and I don’t have concrete examples I can reproduce.
|
||||
|
||||
#### example 5: the deep space kraken
|
||||
|
||||
Kerbal Space Program is a space simulation game, and it used to have a bug
|
||||
called the [Deep Space Kraken][16] where when
|
||||
you moved very fast, your ship would start getting destroyed due to floating point issues. This is similar to the other problems we’ve talked out involving big floating numbers (like the variance problem), but I wanted to mention it because:
|
||||
Kerbal Space Program is a space simulation game, and it used to have a bug called the [Deep Space Kraken][16] where when you moved very fast, your ship would start getting destroyed due to floating point issues. This is similar to the other problems we’ve talked out involving big floating numbers (like the variance problem), but I wanted to mention it because:
|
||||
|
||||
- it has a funny name
|
||||
- it seems like a very common bug in video games / astrophysics / simulations in general – if you have points that are very far from the origin, your math gets messed up
|
||||
@ -307,32 +240,24 @@ Another example of this is the [Far Lands][17] in Minecraft.
|
||||
|
||||
#### example 6: the inaccurate timestamp
|
||||
|
||||
I promise this is the last example of “very large floating numbers can ruin your day”.
|
||||
But! Just one more! Let’s imagine that we try to represent the current Unix epoch in nanoseconds
|
||||
(about 1673580409000000000) as a 64-bit floating point number.
|
||||
I promise this is the last example of “very large floating numbers can ruin your day”. But! Just one more! Let’s imagine that we try to represent the current Unix epoch in nanoseconds (about 1673580409000000000) as a 64-bit floating point number.
|
||||
|
||||
This is no good! 1673580409000000000 is about 2^60 (crucially, bigger than 2^53), and the next 64-bit float after it is 1673580409000000256.
|
||||
|
||||
So this would be a great way to end up with inaccuracies in your time math. Of
|
||||
course, time libraries actually represent times as integers, so this isn’t
|
||||
usually a problem. (there’s always still the [year 2038 problem][18], but that’s not
|
||||
related to floats)
|
||||
So this would be a great way to end up with inaccuracies in your time math. Of course, time libraries actually represent times as integers, so this isn’t usually a problem. (there’s always still the [year 2038 problem][18], but that’s not related to floats)
|
||||
|
||||
In general, the lesson here is that sometimes it’s better to use integers.
|
||||
|
||||
#### example 7: splitting a page into columns
|
||||
|
||||
Now that we’ve talked about problems with big floating point numbers, let’s do
|
||||
a problem with small floating point numbers.
|
||||
Now that we’ve talked about problems with big floating point numbers, let’s do a problem with small floating point numbers.
|
||||
|
||||
Let’s say you have a page width, and a column width, and you want to figure out:
|
||||
|
||||
- how many columns fit on the page
|
||||
- how much space is left over
|
||||
|
||||
You might reasonably try `floor(page_width / column_width)` for the first
|
||||
question and `page_width % column_width` for the second question. Because
|
||||
that would work just fine with integers!
|
||||
You might reasonably try `floor(page_width / column_width)` for the first question and `page_width % column_width` for the second question. Because that would work just fine with integers!
|
||||
|
||||
```
|
||||
In [5]: math.floor(13.716 / 4.572)
|
||||
@ -344,21 +269,15 @@ Out[6]: 4.571999999999999
|
||||
|
||||
This is wrong! The amount of space left is 0!
|
||||
|
||||
A better way to calculate the amount of space left might have been
|
||||
`13.716 - 3 * 4.572`, which gives us a very small negative number.
|
||||
A better way to calculate the amount of space left might have been `13.716 - 3 * 4.572`, which gives us a very small negative number.
|
||||
|
||||
I think the lesson here is to never calculate the same thing in 2 different ways with floats.
|
||||
|
||||
This is a very basic example but I can kind of see how this would create all
|
||||
kinds of problems if I was doing page layout with floating point numbers, or
|
||||
doing CAD drawings.
|
||||
This is a very basic example but I can kind of see how this would create all kinds of problems if I was doing page layout with floating point numbers, or doing CAD drawings.
|
||||
|
||||
#### example 8: collision checking
|
||||
|
||||
Here’s a very silly Python program, that starts a variable at 1000 and
|
||||
decrements it until it collides with 0. You can imagine that this is part of a
|
||||
pong game or something, and that `a` is a ball that’s supposed to collide with
|
||||
a wall.
|
||||
Here’s a very silly Python program, that starts a variable at 1000 and decrements it until it collides with 0. You can imagine that this is part of a pong game or something, and that `a` is a ball that’s supposed to collide with a wall.
|
||||
|
||||
```
|
||||
a = 1000
|
||||
@ -366,21 +285,15 @@ while a != 0:
|
||||
a -= 0.001
|
||||
```
|
||||
|
||||
You might expect this program to terminate. But it doesn’t! `a` is never 0,
|
||||
instead it goes from 1.673494676862619e-08 to -0.0009999832650532314.
|
||||
You might expect this program to terminate. But it doesn’t! `a` is never 0, instead it goes from 1.673494676862619e-08 to -0.0009999832650532314.
|
||||
|
||||
The lesson here is that instead of checking for float equality, usually you
|
||||
want to check if two numbers are different by some very small amount. Or here
|
||||
we could just write `while a > 0`.
|
||||
The lesson here is that instead of checking for float equality, usually you want to check if two numbers are different by some very small amount. Or here we could just write `while a > 0`.
|
||||
|
||||
#### that’s all for now
|
||||
|
||||
I didn’t even get to NaNs (the are so many of them!) or infinity or +0 / -0 or subnormals, but we’ve
|
||||
already written 2000 words and I’m going to just publish this.
|
||||
I didn’t even get to NaNs (the are so many of them!) or infinity or +0 / -0 or subnormals, but we’ve already written 2000 words and I’m going to just publish this.
|
||||
|
||||
I might write another followup post later – that Mastodon thread has literally
|
||||
15,000 words of floating point problems in it, there’s a lot of material! Or I
|
||||
might not, who knows :)
|
||||
I might write another followup post later – that Mastodon thread has literally 15,000 words of floating point problems in it, there’s a lot of material! Or I might not, who knows :)
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
@ -12,18 +12,22 @@ Examples of problems with integers
|
||||
|
||||
Hello! A few days back we talked about [problems with floating point numbers][1].
|
||||
|
||||
This got me thinking – but what about integers? Of course integers have all
|
||||
kinds of problems too – anytime you represent a number in a small fixed amount of
|
||||
space (like 8/16/32/64 bits), you’re going to run into problems.
|
||||
This got me thinking – but what about integers? Of course integers have all kinds of problems too – anytime you represent a number in a small fixed amount of space (like 8/16/32/64 bits), you’re going to run into problems.
|
||||
|
||||
So I [asked on Mastodon again][2] for examples of integer problems and got all kinds of great responses again. Here’s a table of contents.
|
||||
|
||||
[example 1: the small database primary key][3][example 2: integer overflow/underflow][4][aside: how do computers represent negative integers?][5][example 3: decoding a binary format in Java][6][example 4: misinterpreting an IP address or string as an integer][7][example 5: security problems because of integer overflow][8][example 6: the case of the mystery byte order][9][example 7: modulo of negative numbers][10][example 8: compilers removing integer overflow checks][11][example 9: the && typo][12]
|
||||
- [example 1: the small database primary key][3]
|
||||
- [example 2: integer overflow/underflow][4]
|
||||
- [aside: how do computers represent negative integers?][5]
|
||||
- [example 3: decoding a binary format in Java][6]
|
||||
- [example 4: misinterpreting an IP address or string as an integer][7]
|
||||
- [example 5: security problems because of integer overflow][8]
|
||||
- [example 6: the case of the mystery byte order][9]
|
||||
- [example 7: modulo of negative numbers][10]
|
||||
- [example 8: compilers removing integer overflow checks][11]
|
||||
- [example 9: the && typo][12]
|
||||
|
||||
Like last time, I’ve written some example programs to demonstrate these
|
||||
problems. I’ve tried to use a variety of languages in the examples (Go,
|
||||
Javascript, Java, and C) to show that these problems don’t just show up in
|
||||
super low level C programs – integers are everywhere!
|
||||
Like last time, I’ve written some example programs to demonstrate these problems. I’ve tried to use a variety of languages in the examples (Go, Javascript, Java, and C) to show that these problems don’t just show up in super low level C programs – integers are everywhere!
|
||||
|
||||
Also I’ve probably made some mistakes in here, I learned several things while writing this.
|
||||
|
||||
@ -36,9 +40,7 @@ One of the most classic (and most painful!) integer problems is:
|
||||
- oh no!
|
||||
- You need to do a database migration to switch your primary key to be a 64-bit integer instead
|
||||
|
||||
If the primary key actually reaches its maximum value I’m not sure exactly what
|
||||
happens, I’d imagine you wouldn’t be able to create any new database rows and
|
||||
it would be a very bad day for your massively successful service.
|
||||
If the primary key actually reaches its maximum value I’m not sure exactly what happens, I’d imagine you wouldn’t be able to create any new database rows and it would be a very bad day for your massively successful service.
|
||||
|
||||
#### example 2: integer overflow/underflow
|
||||
|
||||
@ -87,20 +89,15 @@ Some brief notes about other languages:
|
||||
- In C, you can compile with `clang -fsanitize=unsigned-integer-overflow`. Then if your code has an overflow/underflow like this, the program will crash.
|
||||
- Similarly in Rust, if you compile your program in debug mode it’ll crash if there’s an integer overflow. But in release mode it won’t crash, it’ll just happily decide that 0 - 1 = 4294967295.
|
||||
|
||||
The reason Rust doesn’t check for overflows if you compile your program in
|
||||
release mode (and the reason C and Go don’t check) is that – these checks are
|
||||
expensive! Integer arithmetic is a very big part of many computations, and
|
||||
making sure that every single addition isn’t overflowing makes it slower.
|
||||
The reason Rust doesn’t check for overflows if you compile your program in release mode (and the reason C and Go don’t check) is that – these checks are expensive! Integer arithmetic is a very big part of many computations, and making sure that every single addition isn’t overflowing makes it slower.
|
||||
|
||||
#### aside: how do computers represent negative integers?
|
||||
|
||||
I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or
|
||||
`4294967295`. You might be thinking – what??? Why would `0xFFFFFFFF` mean `-1`?
|
||||
I mentioned in the last section that `0xFFFFFFFF` can mean either `-1` or `4294967295`. You might be thinking – what??? Why would `0xFFFFFFFF` mean `-1`?
|
||||
|
||||
So let’s talk about how computers represent negative integers for a second.
|
||||
|
||||
I’m going to simplify and talk about 8-bit integers instead of 32-bit integers,
|
||||
because there are less of them and it works basically the same way.
|
||||
I’m going to simplify and talk about 8-bit integers instead of 32-bit integers, because there are less of them and it works basically the same way.
|
||||
|
||||
You can represent 256 different numbers with an 8-bit integer: 0 to 255
|
||||
|
||||
@ -112,9 +109,7 @@ You can represent 256 different numbers with an 8-bit integer: 0 to 255
|
||||
11111111 -> 255
|
||||
```
|
||||
|
||||
But what if you want to represent _negative_ integers? We still only have 8
|
||||
bits! So we need to reassign some of these and treat them as negative numbers
|
||||
instead.
|
||||
But what if you want to represent _negative_ integers? We still only have 8 bits! So we need to reassign some of these and treat them as negative numbers instead.
|
||||
|
||||
Here’s the way most modern computers do it:
|
||||
|
||||
@ -147,9 +142,7 @@ That’s how we end up with `0xFFFFFFFF` meaning -1.
|
||||
|
||||
#### there are multiple ways to represent negative integers
|
||||
|
||||
The way we just talked about of representing negative integers (“it’s the equivalent positive integer, but you subtract 2^n”) is called
|
||||
**two’s complement**, and it’s the most common on modern computers. There are several other ways
|
||||
though, the [wikipedia article has a list][14].
|
||||
The way we just talked about of representing negative integers (“it’s the equivalent positive integer, but you subtract 2^n”) is called **two’s complement**, and it’s the most common on modern computers. There are several other ways though, the [wikipedia article has a list][14].
|
||||
|
||||
#### weird thing: the absolute value of -128 is negative
|
||||
|
||||
@ -182,16 +175,13 @@ This prints out:
|
||||
-128
|
||||
```
|
||||
|
||||
This is because the signed 8-bit integers go from -128 to 127 – there **is** no +128!
|
||||
Some programs might crash when you try to do this (it’s an overflow), but Go
|
||||
doesn’t.
|
||||
This is because the signed 8-bit integers go from -128 to 127 – there **is** no +128! Some programs might crash when you try to do this (it’s an overflow), but Go doesn’t.
|
||||
|
||||
Now that we’ve talked about signed integers a bunch, let’s dig into another example of how they can cause problems.
|
||||
|
||||
#### example 3: decoding a binary format in Java
|
||||
|
||||
Let’s say you’re parsing a binary format in Java, and you want to get the first
|
||||
4 bits of the byte `0x90`. The correct answer is 9.
|
||||
Let’s say you’re parsing a binary format in Java, and you want to get the first 4 bits of the byte `0x90`. The correct answer is 9.
|
||||
|
||||
```
|
||||
public class Main {
|
||||
@ -222,9 +212,7 @@ Let’s break down what those two facts mean for our little calculation `b >> 4`
|
||||
|
||||
#### what can you do about it?
|
||||
|
||||
I don’t the actual idiomatic way to do this in Java is, but the way I’d naively
|
||||
approach fixing this is to put in a bit mask before doing the right shift. So
|
||||
instead of:
|
||||
I don’t the actual idiomatic way to do this in Java is, but the way I’d naively approach fixing this is to put in a bit mask before doing the right shift. So instead of:
|
||||
|
||||
```
|
||||
b >> 4
|
||||
@ -238,20 +226,15 @@ we’d write
|
||||
|
||||
`b & 0xFF` seems redundant (`b` is already a byte!), but it’s actually not because `b` is being promoted to an integer.
|
||||
|
||||
Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> 0x00000009`, which is the result we wanted: 9.
|
||||
Now instead of `0x90 -> 0xFFFFFF90 -> 0xFFFFFFF9`, we end up calculating `0x90 -> 0xFFFFFF90 -> 0x00000090 -> x00000009`, which is the result we wanted: 9.
|
||||
|
||||
And when we actually try it, it prints out “9”.
|
||||
|
||||
Also, if we were using a language with unsigned integers, the natural way to
|
||||
deal with this would be to treat the value as an unsigned integer in the first
|
||||
place. But that’s not possible in Java.
|
||||
Also, if we were using a language with unsigned integers, the natural way to deal with this would be to treat the value as an unsigned integer in the first place. But that’s not possible in Java.
|
||||
|
||||
#### example 4: misinterpreting an IP address or string as an integer
|
||||
|
||||
I don’t know if this is technically a “problem with integers” but it’s funny
|
||||
so I’ll mention it: [Rachel by the bay][16] has a bunch of great
|
||||
examples of things that are not integers being interpreted as integers. For
|
||||
example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
|
||||
I don’t know if this is technically a “problem with integers” but it’s funny so I’ll mention it: [Rachel by the bay][16] has a bunch of great examples of things that are not integers being interpreted as integers. For example, “HTTP” is `0x48545450` and `2130706433` is `127.0.0.1`.
|
||||
|
||||
She points out that you can actually ping any integer, and it’ll convert that integer into an IP address, for example:
|
||||
|
||||
@ -266,8 +249,7 @@ PING 132848123841239999988888888888234234234234234234 (251.164.101.122): 56 data
|
||||
|
||||
#### example 5: security problems because of integer overflow
|
||||
|
||||
Another integer overflow example: here’s a [search for CVEs involving integer overflows][17].
|
||||
There are a lot! I’m not a security person, but here’s one random example: this [json parsing library bug][18]
|
||||
Another integer overflow example: here’s a [search for CVEs involving integer overflows][17]. There are a lot! I’m not a security person, but here’s one random example: this [json parsing library bug][18]
|
||||
|
||||
My understanding of that json parsing bug is roughly:
|
||||
|
||||
@ -276,40 +258,25 @@ My understanding of that json parsing bug is roughly:
|
||||
- but the JSON file is still 3GB, so it gets copied into the tiny buffer with almost 0 bytes of memory
|
||||
- this overwrites all kinds of other memory that it’s not supposed to
|
||||
|
||||
The CVE says “This vulnerability mostly impacts process availability”, which I
|
||||
think means “the program crashes”, but sometimes this kind of thing is much
|
||||
worse and can result in arbitrary code execution.
|
||||
The CVE says “This vulnerability mostly impacts process availability”, which I think means “the program crashes”, but sometimes this kind of thing is much worse and can result in arbitrary code execution.
|
||||
|
||||
My impression is that there are a large variety of different flavours of
|
||||
security vulnerabilities caused by integer overflows.
|
||||
My impression is that there are a large variety of different flavours of security vulnerabilities caused by integer overflows.
|
||||
|
||||
#### example 6: the case of the mystery byte order
|
||||
|
||||
One person said that they’re do scientific computing and sometimes they need to
|
||||
read files which contain data with an unknown byte order.
|
||||
One person said that they’re do scientific computing and sometimes they need to read files which contain data with an unknown byte order.
|
||||
|
||||
Let’s invent a small example of this: say you’re reading a file which contains 4
|
||||
bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know
|
||||
represent a 4-byte integer. There are 2 ways to interpret that integer:
|
||||
Let’s invent a small example of this: say you’re reading a file which contains 4 bytes - `00`, `00`, `12`, and `81` (in that order), that you happen to know represent a 4-byte integer. There are 2 ways to interpret that integer:
|
||||
|
||||
- `0x00001281` (which translates to 4737). This order is called “big endian”
|
||||
- `0x81120000` (which translates to 2165440512). This order is called “little endian”.
|
||||
|
||||
Which one is it? Well, maybe the file contains some metadata that specifies the
|
||||
endianness. Or maybe you happen to know what machine it was generated on and
|
||||
what byte order that machine uses. Or maybe you just read a bunch of values,
|
||||
try both orders, and figure out which makes more sense. Maybe 2165440512 is too
|
||||
big to make sense in the context of whatever your data is supposed to mean, or
|
||||
maybe `4737` is too small.
|
||||
Which one is it? Well, maybe the file contains some metadata that specifies the endianness. Or maybe you happen to know what machine it was generated on and what byte order that machine uses. Or maybe you just read a bunch of values, try both orders, and figure out which makes more sense. Maybe 2165440512 is too big to make sense in the context of whatever your data is supposed to mean, or maybe `4737` is too small.
|
||||
|
||||
A couple more notes on this:
|
||||
|
||||
- this isn’t just a problem with integers, floating point numbers have byte
|
||||
order too
|
||||
- this also comes up when reading data from a network, but in that case the
|
||||
byte order isn’t a “mystery”, it’s just going to be big endian. But x86
|
||||
machines (and many others) are little endian, so you have to swap the byte
|
||||
order of all your numbers.
|
||||
- this isn’t just a problem with integers, floating point numbers have byte order too
|
||||
- this also comes up when reading data from a network, but in that case the byte order isn’t a “mystery”, it’s just going to be big endian. But x86 machines (and many others) are little endian, so you have to swap the byte order of all your numbers.
|
||||
|
||||
#### example 7: modulo of negative numbers
|
||||
|
||||
@ -317,17 +284,13 @@ This is more of a design decision about how different programming languages desi
|
||||
|
||||
Let’s say you write `-13 % 3` in your program, or `13 % -3`. What’s the result?
|
||||
|
||||
It turns out that different programming languages do it differently, for
|
||||
example in Python `-13 % 3 = 2` but in Javascript `-13 % 3 = -1`.
|
||||
It turns out that different programming languages do it differently, for example in Python `-13 % 3 = 2` but in Javascript `-13 % 3 = -1`.
|
||||
|
||||
There’s a table in [this blog post][19] that
|
||||
describes a bunch of different programming languages’ choices.
|
||||
There’s a table in [this blog post][19] that describes a bunch of different programming languages’ choices.
|
||||
|
||||
#### example 8: compilers removing integer overflow checks
|
||||
|
||||
We’ve been hearing a lot about integer overflow and why it’s bad. So let’s
|
||||
imagine you try to be safe and include some checks in your programs – after
|
||||
each addition, you make sure that the calculation didn’t overflow. Like this:
|
||||
We’ve been hearing a lot about integer overflow and why it’s bad. So let’s imagine you try to be safe and include some checks in your programs – after each addition, you make sure that the calculation didn’t overflow. Like this:
|
||||
|
||||
```
|
||||
#include <stdio.h>
|
||||
@ -356,39 +319,26 @@ $ gcc -O3 check_overflow.c -o check_overflow && ./check_overflow
|
||||
0
|
||||
```
|
||||
|
||||
That’s weird – when we compile with `gcc`, we get the answer we expected, but
|
||||
with `gcc -O3`, we get a different answer. Why?
|
||||
That’s weird – when we compile with `gcc`, we get the answer we expected, but with `gcc -O3`, we get a different answer. Why?
|
||||
|
||||
#### what’s going on?
|
||||
|
||||
My understanding (which might be wrong) is:
|
||||
|
||||
- Signed integer overflow in C is **undefined behavior**. I think that’s
|
||||
because different C implementations might be using different representations
|
||||
of signed integers (maybe they’re using one’s complement instead of two’s
|
||||
complement or something)
|
||||
- Signed integer overflow in C is **undefined behavior**. I think that’s because different C implementations might be using different representations of signed integers (maybe they’re using one’s complement instead of two’s complement or something)
|
||||
- “undefined behaviour” in C means “the compiler is free to do literally whatever it wants after that point” (see this post [With undefined behaviour, anything is possible][20] by Raph Levine for a lot more)
|
||||
- Some compiler optimizations assume that undefined behaviour will never
|
||||
happen. They’re free to do this, because – if that undefined behaviour
|
||||
_did_ happen, then they’re allowed to do whatever they want, so “run the
|
||||
code that I optimized assuming that this would never happen” is fine.
|
||||
- So this `if (n + 100 < 0)` check is irrelevant – if that did
|
||||
happen, it would be undefined behaviour, so there’s no need to execute the
|
||||
contents of that if statement.
|
||||
- Some compiler optimizations assume that undefined behaviour will never happen. They’re free to do this, because – if that undefined behaviour _did_ happen, then they’re allowed to do whatever they want, so “run the code that I optimized assuming that this would never happen” is fine.
|
||||
- So this `if (n + 100 < 0)` check is irrelevant – if that did happen, it would be undefined behaviour, so there’s no need to execute the contents of that if statement.
|
||||
|
||||
So, that’s weird. I’m not going to write a “what can you do about it?” section here because I’m pretty out of my depth already.
|
||||
|
||||
I certainly would not have expected that though.
|
||||
|
||||
My impression is that “undefined behaviour” is really a C/C++ concept, and
|
||||
doesn’t exist in other languages in the same way except in the case of “your
|
||||
program called some C code in an incorrect way and that C code did something
|
||||
weird because of undefined behaviour”. Which of course happens all the time.
|
||||
My impression is that “undefined behaviour” is really a C/C++ concept, and doesn’t exist in other languages in the same way except in the case of “your program called some C code in an incorrect way and that C code did something weird because of undefined behaviour”. Which of course happens all the time.
|
||||
|
||||
#### example 9: the && typo
|
||||
|
||||
This one was mentioned as a very upsetting bug. Let’s say you have two integers
|
||||
and you want to check that they’re both nonzero.
|
||||
This one was mentioned as a very upsetting bug. Let’s say you have two integers and you want to check that they’re both nonzero.
|
||||
|
||||
In Javascript, you might write:
|
||||
|
||||
@ -406,9 +356,7 @@ if a & b {
|
||||
}
|
||||
```
|
||||
|
||||
This is still perfectly valid code, but it means something completely different
|
||||
– it’s a bitwise and instead of a boolean and. Let’s go into a Javascript
|
||||
console and look at bitwise vs boolean and for `9` and `4`:
|
||||
This is still perfectly valid code, but it means something completely different – it’s a bitwise and instead of a boolean and. Let’s go into a Javascript console and look at bitwise vs boolean and for `9` and `4`:
|
||||
|
||||
```
|
||||
> 9 && 4
|
||||
@ -421,20 +369,15 @@ console and look at bitwise vs boolean and for `9` and `4`:
|
||||
4
|
||||
```
|
||||
|
||||
It’s easy to imagine this turning into a REALLY annoying bug since it would be
|
||||
intermittent – often `x & y` does turn out to be truthy if `x && y` is truthy.
|
||||
It’s easy to imagine this turning into a REALLY annoying bug since it would be intermittent – often `x & y` does turn out to be truthy if `x && y` is truthy.
|
||||
|
||||
#### what to do about it?
|
||||
|
||||
For Javascript, ESLint has a [no-bitwise check][21] check), which
|
||||
requires you manually flag “no, I actually know what I’m doing, I want to do
|
||||
bitwise and” if you use a bitwise and in your code. I’m sure many other linters
|
||||
have a similar check.
|
||||
For Javascript, ESLint has a [no-bitwise check][21] check), which requires you manually flag “no, I actually know what I’m doing, I want to do bitwise and” if you use a bitwise and in your code. I’m sure many other linters have a similar check.
|
||||
|
||||
#### that’s all for now!
|
||||
|
||||
There are definitely more problems with integers than this, but this got pretty
|
||||
long again and I’m tired of writing again so I’m going to stop :)
|
||||
There are definitely more problems with integers than this, but this got pretty long again and I’m tired of writing again so I’m going to stop :)
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
@ -10,26 +10,16 @@
|
||||
Why does 0.1 + 0.2 = 0.30000000000000004?
|
||||
======
|
||||
|
||||
Hello! I was trying to write about floating point yesterday,
|
||||
and I found myself wondering about this calculation, with 64-bit floats:
|
||||
Hello! I was trying to write about floating point yesterday, and I found myself wondering about this calculation, with 64-bit floats:
|
||||
|
||||
```
|
||||
>>> 0.1 + 0.2
|
||||
0.30000000000000004
|
||||
```
|
||||
|
||||
I realized that I didn’t understand exactly how it worked. I mean, I know
|
||||
floating point calculations are inexact, and I know that you can’t exactly
|
||||
represent `0.1` in binary, but: there’s a floating point number that’s closer to
|
||||
0.3 than `0.30000000000000004`! So why do we get the answer
|
||||
`0.30000000000000004`?
|
||||
I realized that I didn’t understand exactly how it worked. I mean, I know floating point calculations are inexact, and I know that you can’t exactly represent `0.1` in binary, but: there’s a floating point number that’s closer to 0.3 than `0.30000000000000004`! So why do we get the answer `0.30000000000000004`?
|
||||
|
||||
If you don’t feel like reading this whole post with a bunch of calculations, the short answer is that
|
||||
`0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125` lies exactly between
|
||||
2 floating point numbers,
|
||||
`0.299999999999999988897769753748434595763683319091796875` (usually printed as `0.3`) and
|
||||
`0.3000000000000000444089209850062616169452667236328125` (usually printed as `0.30000000000000004`). The answer is
|
||||
`0.30000000000000004` (the second one) because its significand is even.
|
||||
If you don’t feel like reading this whole post with a bunch of calculations, the short answer is that `0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125` lies exactly between 2 floating point numbers, `0.299999999999999988897769753748434595763683319091796875` (usually printed as `0.3`) and `0.3000000000000000444089209850062616169452667236328125` (usually printed as `0.30000000000000004`). The answer is `0.30000000000000004` (the second one) because its significand is even.
|
||||
|
||||
#### how floating point addition works
|
||||
|
||||
@ -38,9 +28,7 @@ This is roughly how floating point addition works:
|
||||
- Add together the numbers (with extra precision)
|
||||
- Round the result to the nearest floating point number
|
||||
|
||||
So let’s use these rules to calculate 0.1 + 0.2. I just learned how floating
|
||||
point addition works yesterday so it’s possible I’ve made some mistakes in this
|
||||
post, but I did get the answers I expected at the end.
|
||||
So let’s use these rules to calculate 0.1 + 0.2. I just learned how floating point addition works yesterday so it’s possible I’ve made some mistakes in this post, but I did get the answers I expected at the end.
|
||||
|
||||
#### step 1: find out what 0.1 and 0.2 are
|
||||
|
||||
@ -53,9 +41,7 @@ First, let’s use Python to figure out what the exact values of `0.1` and `0.2`
|
||||
'0.20000000000000001110223024625156540423631668090820312500000000000000000000000000'
|
||||
```
|
||||
|
||||
These really are the exact values: because floating point numbers are in base
|
||||
2, you can represent them all exactly in base 10. You just need a lot of digits
|
||||
sometimes :)
|
||||
These really are the exact values: because floating point numbers are in base 2, you can represent them all exactly in base 10. You just need a lot of digits sometimes :)
|
||||
|
||||
#### step 2: add the numbers together
|
||||
|
||||
@ -79,8 +65,7 @@ Now, let’s look at the floating point numbers around `0.3`. Here’s the close
|
||||
'0.29999999999999998889776975374843459576368331909179687500000000000000000000000000'
|
||||
```
|
||||
|
||||
We can figure out the next floating point number after `0.3` by serializing
|
||||
`0.3` to 8 bytes with `struct.pack`, adding 1, and then using `struct.unpack`:
|
||||
We can figure out the next floating point number after `0.3` by serializing `0.3` to 8 bytes with `struct.pack`, adding 1, and then using `struct.unpack`:
|
||||
|
||||
```
|
||||
>>> struct.pack("!d", 0.3)
|
||||
@ -100,17 +85,13 @@ Apparently you can also do this with `math.nextafter`:
|
||||
0.30000000000000004
|
||||
```
|
||||
|
||||
So the two 64-bit floats around
|
||||
`0.3` are
|
||||
`0.299999999999999988897769753748434595763683319091796875` and
|
||||
So the two 64-bit floats around `0.3` are `0.299999999999999988897769753748434595763683319091796875` and
|
||||
`0.3000000000000000444089209850062616169452667236328125`
|
||||
|
||||
#### step 4: find out which one is closest to our result
|
||||
|
||||
It turns out that `0.3000000000000000166533453693773481063544750213623046875`
|
||||
is exactly in the middle of
|
||||
`0.299999999999999988897769753748434595763683319091796875` and
|
||||
`0.3000000000000000444089209850062616169452667236328125`.
|
||||
It turns out that `0.3000000000000000166533453693773481063544750213623046875` is exactly in the middle of
|
||||
`0.299999999999999988897769753748434595763683319091796875` and `0.3000000000000000444089209850062616169452667236328125`.
|
||||
|
||||
You can see that with this calculation:
|
||||
|
||||
@ -123,10 +104,7 @@ So neither of them is closest.
|
||||
|
||||
#### how does it know which one to round to?
|
||||
|
||||
In the binary representation of a floating point number, there’s a number
|
||||
called the “significand”. In cases like this (where the result is exactly in
|
||||
between 2 successive floating point number, it’ll round to the one with the
|
||||
even significand.
|
||||
In the binary representation of a floating point number, there’s a number called the “significand”. In cases like this (where the result is exactly in between 2 successive floating point number, it’ll round to the one with the even significand.
|
||||
|
||||
In this case that’s `0.300000000000000044408920985006261616945266723632812500`
|
||||
|
||||
@ -135,20 +113,13 @@ We actually saw the significand of this number a bit earlier:
|
||||
- 0.30000000000000004 is `struct.unpack('!d', b'?\xd3333334')`
|
||||
- 0.3 is `struct.unpack('!d', b'?\xd3333333')`
|
||||
|
||||
The last digit of the big endian hex representation of `0.30000000000000004` is
|
||||
`4`, so that’s the one with the even significand (because the significand is at
|
||||
the end).
|
||||
The last digit of the big endian hex representation of `0.30000000000000004` is `4`, so that’s the one with the even significand (because the significand is at the end).
|
||||
|
||||
#### let’s also work out the whole calculation in binary
|
||||
|
||||
Above we did the calculation in decimal, because that’s a little more intuitive
|
||||
to read. But of course computers don’t do these calculations in decimal –
|
||||
they’re done in a base 2 representation. So I wanted to get an idea of how that
|
||||
worked too.
|
||||
Above we did the calculation in decimal, because that’s a little more intuitive to read. But of course computers don’t do these calculations in decimal – they’re done in a base 2 representation. So I wanted to get an idea of how that worked too.
|
||||
|
||||
I don’t think this binary calculation part of the post is particularly clear
|
||||
but it was helpful for me to write out. There are a really a lot of numbers and
|
||||
it might be terrible to read.
|
||||
I don’t think this binary calculation part of the post is particularly clear but it was helpful for me to write out. There are a really a lot of numbers and it might be terrible to read.
|
||||
|
||||
#### how 64-bit floats numbers work: exponent and significand
|
||||
|
||||
@ -181,11 +152,9 @@ def get_significand(f):
|
||||
return x ^ (exponent << 52)
|
||||
```
|
||||
|
||||
I’m ignoring the sign bit (the first bit) because we only need these functions
|
||||
to work on two numbers (0.1 and 0.2) and those two numbers are both positive.
|
||||
I’m ignoring the sign bit (the first bit) because we only need these functions to work on two numbers (0.1 and 0.2) and those two numbers are both positive.
|
||||
|
||||
First, let’s get the exponent and significand of 0.1. We need to subtract 1023
|
||||
to get the actual exponent because that’s how floating point works.
|
||||
First, let’s get the exponent and significand of 0.1. We need to subtract 1023 to get the actual exponent because that’s how floating point works.
|
||||
|
||||
```
|
||||
>>> get_exponent(0.1) - 1023
|
||||
@ -203,9 +172,7 @@ Here’s that calculation in Python:
|
||||
0.1
|
||||
```
|
||||
|
||||
(you might legitimately be worried about floating point accuracy issues with
|
||||
this calculation, but in this case I’m pretty sure it’s fine because these
|
||||
numbers by definition don’t have accuracy issues – the floating point numbers starting at `2**-4` go up in steps of `1/2**(52 + 4)`)
|
||||
(you might legitimately be worried about floating point accuracy issues with this calculation, but in this case I’m pretty sure it’s fine because these numbers by definition don’t have accuracy issues – the floating point numbers starting at `2**-4` go up in steps of `1/2**(52 + 4)`)
|
||||
|
||||
We can do the same thing for `0.2`:
|
||||
|
||||
@ -309,10 +276,7 @@ That’s the answer we expected:
|
||||
|
||||
#### this probably isn’t exactly how it works in hardware
|
||||
|
||||
The way I’ve described the operations here isn’t literally exactly
|
||||
what happens when you do floating point addition (it’s not “solving for X” for
|
||||
example), I’m sure there are a lot of efficient tricks. But I think it’s about
|
||||
the same idea.
|
||||
The way I’ve described the operations here isn’t literally exactly what happens when you do floating point addition (it’s not “solving for X” for example), I’m sure there are a lot of efficient tricks. But I think it’s about the same idea.
|
||||
|
||||
#### printing out floating point numbers is pretty weird
|
||||
|
||||
@ -325,48 +289,31 @@ We said earlier that the floating point number 0.3 isn’t equal to 0.3. It’s
|
||||
|
||||
So when you print out that number, why does it display `0.3`?
|
||||
|
||||
The computer isn’t actually printing out the exact value of the number, instead
|
||||
it’s printing out the _shortest_ decimal number `d` which has the property that
|
||||
our floating point number `f` is the closest floating point number to `d`.
|
||||
The computer isn’t actually printing out the exact value of the number, instead it’s printing out the _shortest_ decimal number `d` which has the property that our floating point number `f` is the closest floating point number to `d`.
|
||||
|
||||
It turns out that doing this efficiently isn’t trivial at all, and there are a bunch of academic papers about it like [Printing Floating-Point Numbers Quickly and Accurately][1]. or [How to print floating point numbers accurately][2].
|
||||
|
||||
#### would it be more intuitive if computers printed out the exact value of a float?
|
||||
|
||||
Rounding to a nice clean decimal value is nice, but in a way I feel like it
|
||||
might be more intuitive if computers just printed out the exact value of a
|
||||
floating point number – it might make it seem a lot less surprising when you
|
||||
get weird results.
|
||||
Rounding to a nice clean decimal value is nice, but in a way I feel like it might be more intuitive if computers just printed out the exact value of a floating point number – it might make it seem a lot less surprising when you get weird results.
|
||||
|
||||
To me,
|
||||
0.1000000000000000055511151231257827021181583404541015625 +
|
||||
0.200000000000000011102230246251565404236316680908203125
|
||||
= 0.3000000000000000444089209850062616169452667236328125 feels less surprising than 0.1 + 0.2 = 0.30000000000000004.
|
||||
To me, 0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125 = 0.3000000000000000444089209850062616169452667236328125 feels less surprising than 0.1 + 0.2 = 0.30000000000000004.
|
||||
|
||||
Probably this is a bad idea, it would definitely use a lot of screen space.
|
||||
|
||||
#### a quick note on PHP
|
||||
|
||||
Someone in the comments somewhere pointed out that `<?php echo (0.1 + 0.2 );?>`
|
||||
prints out `0.3`. Does that mean that floating point math is different in PHP?
|
||||
Someone in the comments somewhere pointed out that `<?php echo (0.1 + 0.2 );?>` prints out `0.3`. Does that mean that floating point math is different in PHP?
|
||||
|
||||
I think the answer is no – if I run:
|
||||
|
||||
`<?php echo (0.1 + 0.2 )- 0.3);?>` on [this
|
||||
page][3], I get the exact same answer as in
|
||||
Python 5.5511151231258E-17. So it seems like the underlying floating point
|
||||
math is the same.
|
||||
`<?php echo (0.1 + 0.2 )- 0.3);?>` on [this page][3], I get the exact same answer as in Python 5.5511151231258E-17. So it seems like the underlying floating point math is the same.
|
||||
|
||||
I think the reason that `0.1 + 0.2` prints out `0.3` in PHP is that PHP’s
|
||||
algorithm for displaying floating point numbers is less precise than Python’s
|
||||
– it’ll display `0.3` even if that number isn’t the closest floating point
|
||||
number to 0.3.
|
||||
I think the reason that `0.1 + 0.2` prints out `0.3` in PHP is that PHP’s algorithm for displaying floating point numbers is less precise than Python’s – it’ll display `0.3` even if that number isn’t the closest floating point number to 0.3.
|
||||
|
||||
#### that’s all!
|
||||
|
||||
I kind of doubt that anyone had the patience to follow all of that arithmetic,
|
||||
but it was helpful for me to write down, so I’m publishing this post anyway.
|
||||
Hopefully some of this makes sense.
|
||||
I kind of doubt that anyone had the patience to follow all of that arithmetic, but it was helpful for me to write down, so I’m publishing this post anyway. Hopefully some of this makes sense.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
@ -383,4 +330,4 @@ via: https://jvns.ca/blog/2023/02/08/why-does-0-1-plus-0-2-equal-0-3000000000000
|
||||
[b]: https://github.com/lkxed/
|
||||
[1]: https://legacy.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf
|
||||
[2]: https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf
|
||||
[3]: https://replit.com/languages/php_cli
|
||||
[3]: https://replit.com/languages/php_cli
|
||||
|
@ -10,33 +10,19 @@
|
||||
Some notes on using nix
|
||||
======
|
||||
|
||||
Recently I started using a Mac for the first time. The biggest downside I’ve
|
||||
noticed so far is that the package management is much worse than on Linux.
|
||||
At some point I got frustrated with homebrew because I felt like it was
|
||||
spending too much time upgrading when I installed new packages, and so I
|
||||
thought – maybe I’ll try the [nix][1] package manager!
|
||||
Recently I started using a Mac for the first time. The biggest downside I’ve noticed so far is that the package management is much worse than on Linux. At some point I got frustrated with homebrew because I felt like it was spending too much time upgrading when I installed new packages, and so I thought – maybe I’ll try the [nix][1] package manager!
|
||||
|
||||
nix has a reputation for being confusing (it has its whole
|
||||
own programming language!), so I’ve been trying to figure out how to use nix in
|
||||
a way that’s as simple as possible and does not involve managing any
|
||||
configuration files or learning a new programming language. Here’s what I’ve
|
||||
figured out so far! We’ll talk about how to:
|
||||
nix has a reputation for being confusing (it has its whole own programming language!), so I’ve been trying to figure out how to use nix in a way that’s as simple as possible and does not involve managing any configuration files or learning a new programming language. Here’s what I’ve figured out so far! We’ll talk about how to:
|
||||
|
||||
- install packages with nix
|
||||
- build a custom nix package for a C++ program called [paperjam][2]
|
||||
- install a 5-year-old version of [hugo][3] with nix
|
||||
|
||||
As usual I’ve probably gotten some stuff wrong in this post since I’m still
|
||||
pretty new to nix. I’m also still not sure how much I like nix – it’s very
|
||||
confusing! But it’s helped me compile some software that I was struggling to
|
||||
compile otherwise, and in general it seems to install things faster than
|
||||
homebrew.
|
||||
As usual I’ve probably gotten some stuff wrong in this post since I’m still pretty new to nix. I’m also still not sure how much I like nix – it’s very confusing! But it’s helped me compile some software that I was struggling to compile otherwise, and in general it seems to install things faster than homebrew.
|
||||
|
||||
#### what’s interesting about nix?
|
||||
|
||||
People often describe nix as “declarative package management”. I don’t
|
||||
care that much about declarative package management, so here are two things
|
||||
that I appreciate about nix:
|
||||
People often describe nix as “declarative package management”. I don’t care that much about declarative package management, so here are two things that I appreciate about nix:
|
||||
|
||||
- It provides binary packages (hosted at [https://cache.nixos.org/][4]) that you can quickly download and install
|
||||
- For packages which don’t have binary packages, it makes it easier to compile them
|
||||
@ -44,12 +30,8 @@ that I appreciate about nix:
|
||||
I think that the reason nix is good at compiling software is that:
|
||||
|
||||
- you can have multiple versions of the same library or program installed at a time (you could have 2 different versions of libc for instance). For example I have two versions of node on my computer right now, one at `/nix/store/4ykq0lpvmskdlhrvz1j3kwslgc6c7pnv-nodejs-16.17.1` and one at `/nix/store/5y4bd2r99zhdbir95w5pf51bwfg37bwa-nodejs-18.9.1`.
|
||||
- when nix builds a package, it builds it in isolation, using only the
|
||||
specific versions of its dependencies that you explicitly declared. So
|
||||
there’s no risk that the package secretly depends on another package on your
|
||||
system that you don’t know about. No more fighting with `LD_LIBRARY_PATH`!
|
||||
- a lot of people have put a lot of work into writing down all of the
|
||||
dependencies of packages
|
||||
- when nix builds a package, it builds it in isolation, using only the specific versions of its dependencies that you explicitly declared. So there’s no risk that the package secretly depends on another package on your
|
||||
system that you don’t know about. No more fighting with `LD_LIBRARY_PATH`! - a lot of people have put a lot of work into writing down all of the dependencies of packages
|
||||
|
||||
I’ll give a couple of examples later in this post of two times nix made it easier for me to compile software.
|
||||
|
||||
@ -72,15 +54,11 @@ nix-env -iA nixpkgs.fish
|
||||
|
||||
This seems to just download some binaries from [https://cache.nixos.org][8] – pretty simple.
|
||||
|
||||
Some people use nix to install their Node and Python and Ruby packages, but I haven’t
|
||||
been doing that – I just use `npm install` and `pip install` the same way I
|
||||
always have.
|
||||
Some people use nix to install their Node and Python and Ruby packages, but I haven’t been doing that – I just use `npm install` and `pip install` the same way I always have.
|
||||
|
||||
#### some nix features I’m not using
|
||||
|
||||
There are a bunch of nix features/tools that I’m not using, but that I’ll
|
||||
mention. I originally thought that you _had_ to use these features to use nix,
|
||||
because most of the nix tutorials I’ve read talk about them. But you don’t have to use them.
|
||||
There are a bunch of nix features/tools that I’m not using, but that I’ll mention. I originally thought that you _had_ to use these features to use nix, because most of the nix tutorials I’ve read talk about them. But you don’t have to use them.
|
||||
|
||||
- NixOS (a Linux distribution)
|
||||
- [nix-shell][9]
|
||||
@ -88,8 +66,7 @@ because most of the nix tutorials I’ve read talk about them. But you don’t h
|
||||
- [home-manager][11]
|
||||
- [devenv.sh][12]
|
||||
|
||||
I won’t go into these because I haven’t really used them and there are lots of
|
||||
explanations out there.
|
||||
I won’t go into these because I haven’t really used them and there are lots of explanations out there.
|
||||
|
||||
#### where are nix packages defined?
|
||||
|
||||
@ -107,16 +84,14 @@ I found a way to search nix packages from the command line that I liked better:
|
||||
|
||||
#### everything is installed with symlinks
|
||||
|
||||
One of nix’s major design choices is that there isn’t one single `bin` with all
|
||||
your packages, instead you use symlinks. There are a lot of layers of symlinks. A few examples of symlinks:
|
||||
One of nix’s major design choices is that there isn’t one single `bin` with all your packages, instead you use symlinks. There are a lot of layers of symlinks. A few examples of symlinks:
|
||||
|
||||
- `~/.nix-profile` on my machine is (indirectly) a symlink to `/nix/var/nix/profiles/per-user/bork/profile-111-link/`
|
||||
- `~/.nix-profile/bin/fish` is a symlink to `/nix/store/afkwn6k8p8g97jiqgx9nd26503s35mgi-fish-3.5.1/bin/fish`
|
||||
|
||||
When I install something, it creates a new `profile-112-link` directory with new symlinks and updates my `~/.nix-profile` to point to that directory.
|
||||
|
||||
I think this means that if I install a new version of `fish` and I don’t like it, I can
|
||||
easily go back just by running `nix-env --rollback` – it’ll move me to my previous profile directory.
|
||||
I think this means that if I install a new version of `fish` and I don’t like it, I can easily go back just by running `nix-env --rollback` – it’ll move me to my previous profile directory.
|
||||
|
||||
#### uninstalling packages doesn’t delete them
|
||||
|
||||
@ -161,28 +136,19 @@ I haven’t really upgraded anything yet. I think that if something goes wrong w
|
||||
nix-env --rollback
|
||||
```
|
||||
|
||||
Someone linked me to [this post from Ian Henry][15] that
|
||||
talks about some confusing problems with `nix-env --upgrade` – maybe it
|
||||
doesn’t work the way you’d expect? I guess I’ll be wary around upgrades.
|
||||
Someone linked me to [this post from Ian Henry][15] that talks about some confusing problems with `nix-env --upgrade` – maybe it doesn’t work the way you’d expect? I guess I’ll be wary around upgrades.
|
||||
|
||||
#### next goal: make a custom package of paperjam
|
||||
|
||||
After a few months of installing existing packages, I wanted to make a custom package with nix for a program called [paperjam][2] that wasn’t already packaged.
|
||||
|
||||
I was actually struggling to compile `paperjam` at all even without nix because the version I had
|
||||
of `libiconv` I has on my system was wrong. I thought it might be easier to
|
||||
compile it with nix even though I didn’t know how to make nix packages yet. And
|
||||
it actually was!
|
||||
I was actually struggling to compile `paperjam` at all even without nix because the version I had of `libiconv` I has on my system was wrong. I thought it might be easier to compile it with nix even though I didn’t know how to make nix packages yet. And it actually was!
|
||||
|
||||
But figuring out how to get there was VERY confusing, so here are some notes about how I did it.
|
||||
|
||||
#### how to build an example package
|
||||
|
||||
Before I started working on my `paperjam` package, I wanted to build an example existing package just to
|
||||
make sure I understood the process for building a package. I was really
|
||||
struggling to figure out how to do this, but I asked in Discord and someone
|
||||
explained to me how I could get a working package from [https://github.com/NixOS/nixpkgs/][13] and build it. So here
|
||||
are those instructions:
|
||||
Before I started working on my `paperjam` package, I wanted to build an example existing package just to make sure I understood the process for building a package. I was really struggling to figure out how to do this, but I asked in Discord and someone explained to me how I could get a working package from [https://github.com/NixOS/nixpkgs/][13] and build it. So here are those instructions:
|
||||
|
||||
**step 1:** Download some arbitrary package from [nixpkgs][13] on github, for example the `dash` package:
|
||||
|
||||
@ -190,8 +156,7 @@ are those instructions:
|
||||
wget https://raw.githubusercontent.com/NixOS/nixpkgs/47993510dcb7713a29591517cb6ce682cc40f0ca/pkgs/shells/dash/default.nix -O dash.nix
|
||||
```
|
||||
|
||||
**step 2**: Replace the first statement (`{ lib , stdenv , buildPackages , autoreconfHook , pkg-config , fetchurl , fetchpatch , libedit , runCommand , dash }:` with `with import <nixpkgs> {};` I don’t know why you have to do this,
|
||||
but it works.
|
||||
**step 2**: Replace the first statement (`{ lib , stdenv , buildPackages , autoreconfHook , pkg-config , fetchurl , fetchpatch , libedit , runCommand , dash }:` with `with import <nixpkgs> {};` I don’t know why you have to do this, but it works.
|
||||
|
||||
**step 3**: Run `nix-build dash.nix`
|
||||
|
||||
@ -207,11 +172,7 @@ That’s all! Once I’d done that, I felt like I could modify the `dash` packag
|
||||
|
||||
`paperjam` has one dependency (`libpaper`) that also isn’t packaged yet, so I needed to build `libpaper` first.
|
||||
|
||||
Here’s `libpaper.nix`. I basically just wrote this by copying and pasting from
|
||||
other packages in the [nixpkgs][13] repository.
|
||||
My guess is what’s happening here is that nix has some default rules for
|
||||
compiling C packages (like “run `make install`”), so the `make install` happens
|
||||
default and I don’t need to configure it explicitly.
|
||||
Here’s `libpaper.nix`. I basically just wrote this by copying and pasting from other packages in the [nixpkgs][13] repository. My guess is what’s happening here is that nix has some default rules for compiling C packages (like “run `make install`”), so the `make install` happens default and I don’t need to configure it explicitly.
|
||||
|
||||
```
|
||||
with import <nixpkgs> {};
|
||||
@ -249,10 +210,7 @@ Next, I needed to compile `paperjam`. Here’s a link to the [nix package I wrot
|
||||
|
||||
I set the hashes by first leaving the hash empty, then running `nix-build` to get an error message complaining about a mismatched hash. Then I copied the correct hash out of the error message.
|
||||
|
||||
I figured out how to set `installFlags` just by running `rg PREFIX`
|
||||
in the nixpkgs repository – I figured that needing to set a `PREFIX` was
|
||||
pretty common and someone had probably done it before, and I was right. So I
|
||||
just copied and pasted that line from another package.
|
||||
I figured out how to set `installFlags` just by running `rg PREFIX` in the nixpkgs repository – I figured that needing to set a `PREFIX` was pretty common and someone had probably done it before, and I was right. So I just copied and pasted that line from another package.
|
||||
|
||||
Then I ran:
|
||||
|
||||
@ -265,29 +223,17 @@ and then everything worked and I had `paperjam` installed! Hooray!
|
||||
|
||||
#### next goal: install a 5-year-old version of hugo
|
||||
|
||||
Right now I build this blog using Hugo 0.40, from 2018. I don’t need any new
|
||||
features so I haven’t felt a need to upgrade. On Linux this is easy: Hugo’s
|
||||
releases are a static binary, so I can just download the 5-year-old binary from
|
||||
the [releases page][17] and
|
||||
run it. Easy!
|
||||
Right now I build this blog using Hugo 0.40, from 2018. I don’t need any new features so I haven’t felt a need to upgrade. On Linux this is easy: Hugo’s releases are a static binary, so I can just download the 5-year-old binary from the [releases page][17] and run it. Easy!
|
||||
|
||||
But on this Mac I ran into some complications. Mac hardware has changed in the
|
||||
last 5 years, so the Mac Hugo binary I downloaded crashed. And when I tried to
|
||||
build it from source with `go build`, that didn’t work either because Go build
|
||||
norms have changed in the last 5 years as well.
|
||||
But on this Mac I ran into some complications. Mac hardware has changed in the last 5 years, so the Mac Hugo binary I downloaded crashed. And when I tried to build it from source with `go build`, that didn’t work either because Go build norms have changed in the last 5 years as well.
|
||||
|
||||
I was working around this by running Hugo in a Linux docker container, but I
|
||||
didn’t love that: it was kind of slow and it felt silly. It shouldn’t be that
|
||||
hard to compile one Go program!
|
||||
I was working around this by running Hugo in a Linux docker container, but I didn’t love that: it was kind of slow and it felt silly. It shouldn’t be that hard to compile one Go program!
|
||||
|
||||
Nix to the rescue! Here’s what I did to install the old version of Hugo with
|
||||
nix.
|
||||
Nix to the rescue! Here’s what I did to install the old version of Hugo with nix.
|
||||
|
||||
#### installing Hugo 0.40 with nix
|
||||
|
||||
I wanted to install Hugo 0.40 and put it in my PATH as `hugo-0.40`. Here’s how
|
||||
I did it. I did this in a kind of weird way, but it worked ([Searching and installing old versions of Nix packages][18]
|
||||
describes a probably more normal method).
|
||||
I wanted to install Hugo 0.40 and put it in my PATH as `hugo-0.40`. Here’s how I did it. I did this in a kind of weird way, but it worked ([Searching and installing old versions of Nix packages][18] describes a probably more normal method).
|
||||
|
||||
**step 1**: Search through the nixpkgs repo to find Hugo 0.40
|
||||
|
||||
@ -318,33 +264,19 @@ I figured out how to run this by running `rg 'mv '` in the nixpkgs repository an
|
||||
|
||||
I installed into my `~/.nix-profile/bin` by running `nix-env -i -f hugo.nix`.
|
||||
|
||||
And it all works! I put the final `.nix` file into my own personal [nixpkgs repo][20] so that I can use it again later if I
|
||||
want.
|
||||
And it all works! I put the final `.nix` file into my own personal [nixpkgs repo][20] so that I can use it again later if I want.
|
||||
|
||||
#### reproducible builds aren’t magic, they’re really hard
|
||||
|
||||
I think it’s worth noting here that this `hugo.nix` file isn’t magic – the
|
||||
reason I can easily compile Hugo 0.40 today is that many people worked for a long time to make it possible to
|
||||
package that version of Hugo in a reproducible way.
|
||||
I think it’s worth noting here that this `hugo.nix` file isn’t magic – the reason I can easily compile Hugo 0.40 today is that many people worked for a long time to make it possible to package that version of Hugo in a reproducible way.
|
||||
|
||||
#### that’s all!
|
||||
|
||||
Installing `paperjam` and this 5-year-old version of Hugo were both
|
||||
surprisingly painless and actually much easier than compiling it without nix,
|
||||
because nix made it much easier for me to compile the `paperjam` package with
|
||||
the right version of `libiconv`, and because someone 5 years ago had already
|
||||
gone to the trouble of listing out the exact dependencies for Hugo.
|
||||
Installing `paperjam` and this 5-year-old version of Hugo were both surprisingly painless and actually much easier than compiling it without nix, because nix made it much easier for me to compile the `paperjam` package with the right version of `libiconv`, and because someone 5 years ago had already gone to the trouble of listing out the exact dependencies for Hugo.
|
||||
|
||||
I don’t have any plans to get much more complicated with nix (and it’s still
|
||||
very possible I’ll get frustrated with it and go back to homebrew!), but we’ll
|
||||
see what happens! I’ve found it much easier to start in a simple way and then
|
||||
start using more features if I feel the need instead of adopting a whole bunch
|
||||
of complicated stuff all at once.
|
||||
I don’t have any plans to get much more complicated with nix (and it’s still very possible I’ll get frustrated with it and go back to homebrew!), but we’ll see what happens! I’ve found it much easier to start in a simple way and then start using more features if I feel the need instead of adopting a whole bunch of complicated stuff all at once.
|
||||
|
||||
I probably won’t use nix on Linux – I’ve always been happy enough with `apt`
|
||||
(on Debian-based distros) and `pacman` (on Arch-based distros), and they’re
|
||||
much less confusing. But on a Mac it seems like it might be worth it. We’ll
|
||||
see! It’s very possible in 3 months I’ll get frustrated with nix and just go back to homebrew.
|
||||
I probably won’t use nix on Linux – I’ve always been happy enough with `apt` (on Debian-based distros) and `pacman` (on Arch-based distros), and they’re much less confusing. But on a Mac it seems like it might be worth it. We’ll see! It’s very possible in 3 months I’ll get frustrated with nix and just go back to homebrew.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
@ -10,8 +10,7 @@
|
||||
How do Nix builds work?
|
||||
======
|
||||
|
||||
Hello! For some reason after the last [nix post][1] I got nerdsniped by trying to understand how Nix builds
|
||||
work under the hood, so here’s a quick exploration I did today. There are probably some mistakes in here.
|
||||
Hello! For some reason after the last [nix post][1] I got nerdsniped by trying to understand how Nix builds work under the hood, so here’s a quick exploration I did today. There are probably some mistakes in here.
|
||||
|
||||
I started by [complaining on Mastodon][2]:
|
||||
|
||||
@ -31,24 +30,18 @@ complicated C program.
|
||||
|
||||
#### the goal: compile a C program, without using Nix’s standard machinery
|
||||
|
||||
Our goal is to compile a C program called `paperjam`. This is a real C program
|
||||
that wasn’t in the Nix repository already. I already figured out how to
|
||||
compile it in [this post][1] by copying and pasting a bunch of stuff I didn’t understand, but this time I wanted to do
|
||||
it in a more principled way where I actually understand more of the steps.
|
||||
Our goal is to compile a C program called `paperjam`. This is a real C program that wasn’t in the Nix repository already. I already figured out how to
|
||||
compile it in [this post][1] by copying and pasting a bunch of stuff I didn’t understand, but this time I wanted to do it in a more principled way where I actually understand more of the steps.
|
||||
|
||||
We’re going to avoid using most of Nix’s helpers for compiling C programs.
|
||||
|
||||
The plan is to start with an almost empty build script, and then resolve errors
|
||||
until we have a working build.
|
||||
The plan is to start with an almost empty build script, and then resolve errors until we have a working build.
|
||||
|
||||
#### first: what’s a derivation?
|
||||
|
||||
I said that we weren’t going to talk about too many Nix abstractions (and we won’t!), but understanding what a derivation is really helped me.
|
||||
|
||||
Everything I read about Nix talks about derivations all the time, but I was
|
||||
really struggling to figure out what a derivation _is_. It turns out that `derivation`
|
||||
is a function in the Nix language. But not just any function! The whole point of the Nix language seems to be to
|
||||
to call this function. The [official documentation for the `derivation` function][5] is actually extremely clear. Here’s what I took away:
|
||||
Everything I read about Nix talks about derivations all the time, but I was really struggling to figure out what a derivation _is_. It turns out that `derivation` is a function in the Nix language. But not just any function! The whole point of the Nix language seems to be to to call this function. The [official documentation for the `derivation` function][5] is actually extremely clear. Here’s what I took away:
|
||||
|
||||
`derivation` takes a bunch of keys and values as input. There are 3 required keys:
|
||||
|
||||
@ -56,8 +49,7 @@ to call this function. The [official documentation for the `derivation` function
|
||||
- `name`: the name of the package you’re building
|
||||
- `builder`: a program (usually a bash script) that runs the build
|
||||
|
||||
Every other key is an arbitrary string that gets passed as an environment
|
||||
variable to the `builder` shell script.
|
||||
Every other key is an arbitrary string that gets passed as an environment variable to the `builder` shell script.
|
||||
|
||||
#### derivations automatically build all their inputs
|
||||
|
||||
@ -69,15 +61,12 @@ Nix will:
|
||||
- put the resulting output directory somewhere like `/nix/store/4garxzr1rpdfahf374i9p9fbxnx56519-qpdf-11.1.0`
|
||||
- expand `pkgs.qpdf` into that output directory (as a string), so that I can reference it in my build script
|
||||
|
||||
The derivation function does some other things (described in the
|
||||
[documentation][5]), but “it builds all of its inputs” is all we really need to know
|
||||
The derivation function does some other things (described in the [documentation][5]), but “it builds all of its inputs” is all we really need to know
|
||||
for now.
|
||||
|
||||
#### step 1: write a derivation file
|
||||
|
||||
Let’s write a very simple build script and call the `derivation` function. These don’t work yet,
|
||||
but I found it pretty fun to go through all the errors, fix them one at a time,
|
||||
and learn a little more about how Nix works by fixing them.
|
||||
Let’s write a very simple build script and call the `derivation` function. These don’t work yet, but I found it pretty fun to go through all the errors, fix them one at a time, and learn a little more about how Nix works by fixing them.
|
||||
|
||||
Here’s the build script (`build_paperjam.sh`). This just unpacks the tarball and runs `make install`.
|
||||
|
||||
@ -115,9 +104,7 @@ The main things here are:
|
||||
|
||||
#### problem 1: tar: command not found
|
||||
|
||||
Nix needs you to declare all the dependencies for your builds. It forces this
|
||||
by removing your `PATH` environment variable so that you have no binaries in
|
||||
your PATH at all.
|
||||
Nix needs you to declare all the dependencies for your builds. It forces this by removing your `PATH` environment variable so that you have no binaries in your PATH at all.
|
||||
|
||||
This is pretty easy to fix: we just need to edit our `PATH`.
|
||||
|
||||
@ -150,11 +137,9 @@ The next error was:
|
||||
> #include <qpdf/QPDF.hh>
|
||||
```
|
||||
|
||||
Makes sense: everything is isolated, so it can’t access my system header files.
|
||||
Figuring out how to handle this was a little more confusing though.
|
||||
Makes sense: everything is isolated, so it can’t access my system header files. Figuring out how to handle this was a little more confusing though.
|
||||
|
||||
It turns out that the way Nix handles header files is that it has a shell
|
||||
script wrapper around `clang`. So when you run `clang++`, you’re actually
|
||||
It turns out that the way Nix handles header files is that it has a shell script wrapper around `clang`. So when you run `clang++`, you’re actually
|
||||
running a shell script.
|
||||
|
||||
On my system, the `clang++` wrapper script was at `/nix/store/d929v59l9a3iakvjccqpfqckqa0vflyc-clang-wrapper-11.1.0/bin/clang++`. I searched that file for `LDFLAGS` and found that it uses 2 environment variables:
|
||||
@ -194,22 +179,15 @@ Here’s the next error:
|
||||
|
||||
I started by adding `-L ${pkgs.libiconv}/lib` to my `NIX_LDFLAGS` environment variable, but that didn’t fix it. Then I spent a while going around in circles and being confused.
|
||||
|
||||
I eventually figured out how to fix this by taking a working version of the `paperjam` build that I’d made before
|
||||
and editing my `clang++` wrapper file to print out all of its environment
|
||||
variables. The `LDFLAGS` environment variable in the working version was different from mine: it had `-liconv` in it.
|
||||
I eventually figured out how to fix this by taking a working version of the `paperjam` build that I’d made before and editing my `clang++` wrapper file to print out all of its environment variables. The `LDFLAGS` environment variable in the working version was different from mine: it had `-liconv` in it.
|
||||
|
||||
So I added `-liconv` to `NIX_LDFLAGS` as well and that fixed it.
|
||||
|
||||
#### why doesn’t the original Makefile have -liconv?
|
||||
|
||||
I was a bit puzzled by this `-liconv` thing though: the original Makefile links
|
||||
in `libqpdf` and `libpaper` by passing `-lqpdf -lpaper`. So why doesn’t it link in iconv, if it requires the
|
||||
iconv library?
|
||||
I was a bit puzzled by this `-liconv` thing though: the original Makefile links in `libqpdf` and `libpaper` by passing `-lqpdf -lpaper`. So why doesn’t it link in iconv, if it requires the iconv library?
|
||||
|
||||
I think the reason for this is that the original Makefile assumed that you were
|
||||
running on Linux and using glibc, and glibc includes these iconv functions by
|
||||
default. But I guess Mac OS libc doesn’t include iconv, so we need to
|
||||
explicitly set the linker flag `-liconv` to add the iconv library.
|
||||
I think the reason for this is that the original Makefile assumed that you were running on Linux and using glibc, and glibc includes these iconv functions by default. But I guess Mac OS libc doesn’t include iconv, so we need to explicitly set the linker flag `-liconv` to add the iconv library.
|
||||
|
||||
#### problem 6: missing codesign_allocate
|
||||
|
||||
@ -219,8 +197,7 @@ Time for the next error:
|
||||
libc++abi: terminating with uncaught exception of type std::runtime_error: Failed to spawn codesign_allocate: No such file or directory
|
||||
```
|
||||
|
||||
I guess this is some kind of Mac code signing thing. I used `find /nix/store -name codesign_allocate` to find `codesign_allocate` on my system. It’s at
|
||||
`/nix/store/a17dwfwqj5ry734zfv3k1f5n37s4wxns-cctools-binutils-darwin-973.0.1/bin/codesign_allocate`.
|
||||
I guess this is some kind of Mac code signing thing. I used `find /nix/store -name codesign_allocate` to find `codesign_allocate` on my system. It’s at `/nix/store/a17dwfwqj5ry734zfv3k1f5n37s4wxns-cctools-binutils-darwin-973.0.1/bin/codesign_allocate`.
|
||||
|
||||
But this doesn’t tell us what the package is called – we need to be able to refer to it as `${pkgs.XXXXXXX}` and `${pkgs.cctools-binutils-darwin}` doesn’t work.
|
||||
|
||||
@ -289,8 +266,7 @@ make install PREFIX="$out"
|
||||
|
||||
#### let’s look at our compiled derivation!
|
||||
|
||||
Now that we understand this configuration a little better, let’s talk about
|
||||
what `nix-build` is doing a little more.
|
||||
Now that we understand this configuration a little better, let’s talk about what `nix-build` is doing a little more.
|
||||
|
||||
Behind the scenes, `nix-build paperjam.nix` actually runs `nix-instantiate` and `nix-store --realize`:
|
||||
|
||||
@ -300,11 +276,7 @@ $ nix-instantiate paperjam.nix
|
||||
$ nix-store --realize /nix/store/xp8kibpll55s0bm40wlpip51y7wnpfs0-paperjam-fake.drv
|
||||
```
|
||||
|
||||
I think what this means is that `paperjam.nix` get compiled to some
|
||||
intermediate representation (also called a derivation?), and then the Nix
|
||||
runtime takes over and is in charge of actually running the build scripts.
|
||||
|
||||
We can look at this `.drv` intermediate representation with `nix show-derivation`
|
||||
I think what this means is that `paperjam.nix` get compiled to some intermediate representation (also called a derivation?), and then the Nix runtime takes over and is in charge of actually running the build scripts. We can look at this `.drv` intermediate representation with `nix show-derivation`
|
||||
|
||||
```
|
||||
{
|
||||
@ -345,13 +317,11 @@ We can look at this `.drv` intermediate representation with `nix show-derivation
|
||||
}
|
||||
```
|
||||
|
||||
This feels surprisingly easy to understand – you can see that there are a
|
||||
bunch of environment variables, our bash script, and the paths to our inputs.
|
||||
This feels surprisingly easy to understand – you can see that there are a bunch of environment variables, our bash script, and the paths to our inputs.
|
||||
|
||||
#### the compilation helpers we’re not using: stdenv
|
||||
|
||||
Normally when you build a package with Nix, you don’t do all of this stuff
|
||||
yourself. Instead, you use a helper called `stdenv`, which seems to have two parts:
|
||||
Normally when you build a package with Nix, you don’t do all of this stuff yourself. Instead, you use a helper called `stdenv`, which seems to have two parts:
|
||||
|
||||
- a function called `stdenv.mkDerivation` which takes some arguments and generates a bunch of environment variables (it seems to be [documented here][6])
|
||||
- a 1600-line bash build script ([setup.sh][7]) that consumes those environment variables. This is like our `build-paperjam.sh`, but much more generalized.
|
||||
@ -370,8 +340,7 @@ and probably lots more useful things I don’t know about yet
|
||||
|
||||
#### let’s look at the derivation for jq
|
||||
|
||||
Let’s look at one more compiled derivation, for `jq`. This is quite long but there
|
||||
are some interesting things in here. I wanted to look at this because I wanted to see what a more typical derivation generated by `stdenv.mkDerivation` looked like.
|
||||
Let’s look at one more compiled derivation, for `jq`. This is quite long but there are some interesting things in here. I wanted to look at this because I wanted to see what a more typical derivation generated by `stdenv.mkDerivation` looked like.
|
||||
|
||||
```
|
||||
$ nix show-derivation /nix/store/q9cw5rp0ibpl6h4i2qaq0vdjn4pyms3p-jq-1.6.drv
|
||||
@ -451,8 +420,7 @@ $ nix show-derivation /nix/store/q9cw5rp0ibpl6h4i2qaq0vdjn4pyms3p-jq-1.6.drv
|
||||
}
|
||||
```
|
||||
|
||||
I thought it was interesting that some of the environment variables in here are actually bash scripts themselves – for example the `postInstallCheck` environment variable is a bash script.
|
||||
Those bash script environment variables are `eval`ed in the main bash script (you can [see that happening in setup.sh here][8])
|
||||
I thought it was interesting that some of the environment variables in here are actually bash scripts themselves – for example the `postInstallCheck` environment variable is a bash script. Those bash script environment variables are `eval`ed in the main bash script (you can [see that happening in setup.sh here][8])
|
||||
|
||||
The `postInstallCheck` environment variable in this particular derivation starts like this:
|
||||
|
||||
@ -469,11 +437,7 @@ All of my compiler experiments used about 3GB of disk space, but `nix-collect-ga
|
||||
|
||||
#### let’s recap the process!
|
||||
|
||||
I feel like I understand Nix a bit better after going through this. I still
|
||||
don’t feel very motivated to learn the Nix language, but now I have some
|
||||
idea of what Nix programs are actually doing under the hood!
|
||||
|
||||
My understanding is:
|
||||
I feel like I understand Nix a bit better after going through this. I still don’t feel very motivated to learn the Nix language, but now I have some idea of what Nix programs are actually doing under the hood! My understanding is:
|
||||
|
||||
- First, `.nix` files get compiled into a `.drv` file, which is mostly a bunch of inputs and outputs and environment variables. This is where the Nix language stops being relevant.
|
||||
- Then all the environment variables get passed to a build script, which is in charge of doing the actual build
|
||||
|
@ -10,9 +10,7 @@
|
||||
Some possible reasons for 8-bit bytes
|
||||
======
|
||||
|
||||
I’ve been working on a zine about how computers represent thing in binary, and
|
||||
one question I’ve gotten a few times is – why does the x86 architecture use 8-bit bytes? Why not
|
||||
some other size?
|
||||
I’ve been working on a zine about how computers represent thing in binary, and one question I’ve gotten a few times is – why does the x86 architecture use 8-bit bytes? Why not some other size?
|
||||
|
||||
With any question like this, I think there are two options:
|
||||
|
||||
@ -20,34 +18,18 @@ With any question like this, I think there are two options:
|
||||
- 8 bits is objectively the Best Option for some reason, even if history had played out differently we would still use 8-bit bytes
|
||||
- some mix of 1 & 2
|
||||
|
||||
I’m not super into computer history (I like to use computers a lot more than I
|
||||
like reading about them), but I am always curious if there’s an essential
|
||||
reason for why a computer thing is the way it is today, or whether it’s mostly
|
||||
a historical accident. So we’re going to talk about some computer history.
|
||||
I’m not super into computer history (I like to use computers a lot more than I like reading about them), but I am always curious if there’s an essential reason for why a computer thing is the way it is today, or whether it’s mostly a historical accident. So we’re going to talk about some computer history.
|
||||
|
||||
As an example of a historical accident: DNS has a `class` field which has 5
|
||||
possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me that’s
|
||||
a clear example of a historical accident – I can’t imagine that we’d define
|
||||
the class field the same way if we could redesign DNS today without worrying about backwards compatibility. I’m
|
||||
not sure if we’d use a class field at all!
|
||||
As an example of a historical accident: DNS has a `class` field which has 5 possible values (“internet”, “chaos”, “hesiod”, “none”, and “any”). To me that’s a clear example of a historical accident – I can’t imagine that we’d define the class field the same way if we could redesign DNS today without worrying about backwards compatibility. I’m not sure if we’d use a class field at all!
|
||||
|
||||
There aren’t any definitive answers in this post, but I asked [on Mastodon][1] and
|
||||
here are some potential reasons I found for the 8-bit byte. I think the answer
|
||||
is some combination of these reasons.
|
||||
There aren’t any definitive answers in this post, but I asked [on Mastodon][1] and here are some potential reasons I found for the 8-bit byte. I think the answer is some combination of these reasons.
|
||||
|
||||
#### what’s the difference between a byte and a word?
|
||||
|
||||
First, this post talks about “bytes” and “words” a lot. What’s the difference between a byte and a word? My understanding is:
|
||||
|
||||
- the **byte size** is the smallest unit you can address. For example in a program on my machine `0x20aa87c68` might be the address of one byte, then `0x20aa87c69` is the address of the next byte.
|
||||
- The **word size** is some multiple of the byte size. I’ve been confused about
|
||||
this for years, and the Wikipedia definition is incredibly vague (“a word is
|
||||
the natural unit of data used by a particular processor design”). I
|
||||
originally thought that the word size was the same as your register size (64
|
||||
bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2],
|
||||
on x86 a word is 16 bits even though the registers are 64 bits. So I’m
|
||||
confused – is a word on x86 16 bits or 64 bits? Can it mean both, depending
|
||||
on the context? What’s the deal?
|
||||
- The **word size** is some multiple of the byte size. I’ve been confused about this for years, and the Wikipedia definition is incredibly vague (“a word is the natural unit of data used by a particular processor design”). I originally thought that the word size was the same as your register size (64 bits on x86-64). But according to section 4.1 (“Fundamental Data Types”) of the [Intel architecture manual][2], on x86 a word is 16 bits even though the registers are 64 bits. So I’m confused – is a word on x86 16 bits or 64 bits? Can it mean both, depending on the context? What’s the deal?
|
||||
|
||||
Now let’s talk about some possible reasons that we use 8-bit bytes!
|
||||
|
||||
@ -65,18 +47,11 @@ Here’s a [video interview with Fred Brooks (who managed the project)][4] talki
|
||||
> My most important technical decision in my IBM career was to go with the 8-bit byte for the 360.
|
||||
> And on the basis of I believe character processing was going to become important as opposed to decimal digits.
|
||||
|
||||
It makes sense that an 8-bit byte would be better for text processing: 2^6 is
|
||||
64, so 6 bits wouldn’t be enough for lowercase letters, uppercase letters, and symbols.
|
||||
It makes sense that an 8-bit byte would be better for text processing: 2^6 is 64, so 6 bits wouldn’t be enough for lowercase letters, uppercase letters, and symbols.
|
||||
|
||||
To go with the 8-bit byte, System/360 also introduced the [EBCDIC][5] encoding, which is an 8-bit character encoding.
|
||||
|
||||
It looks like the next important machine in 8-bit-byte history was the
|
||||
[Intel 8008][6], which was built to be
|
||||
used in a computer terminal (the Datapoint 2200). Terminals need to be able to
|
||||
represent letters as well as terminal control codes, so it makes sense for them
|
||||
to use an 8-bit byte.
|
||||
[This Datapoint 2200 manual from the Computer History Museum][7]
|
||||
says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
|
||||
It looks like the next important machine in 8-bit-byte history was the [Intel 8008][6], which was built to be used in a computer terminal (the Datapoint 2200). Terminals need to be able to represent letters as well as terminal control codes, so it makes sense for them to use an 8-bit byte. [This Datapoint 2200 manual from the Computer History Museum][7] says on page 7 that the Datapoint 2200 supported ASCII (7 bit) and EBCDIC (8 bit).
|
||||
|
||||
#### why was the 6-bit byte better for scientific computing?
|
||||
|
||||
@ -90,14 +65,11 @@ I was curious about this comment that the 6-bit byte would be better for scienti
|
||||
> you to lose some of the information more rapidly than you would with binary
|
||||
> shifting
|
||||
|
||||
I don’t understand this comment at all – why does the exponent have to be 8 bits
|
||||
if you use a 32-bit word size? Why couldn’t you use 9 bits or 10 bits if you
|
||||
wanted? But it’s all I could find in a quick search.
|
||||
I don’t understand this comment at all – why does the exponent have to be 8 bits if you use a 32-bit word size? Why couldn’t you use 9 bits or 10 bits if you wanted? But it’s all I could find in a quick search.
|
||||
|
||||
#### why did mainframes use 36 bits?
|
||||
|
||||
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out
|
||||
that there’s a great explanation in the Wikipedia article on [36-bit computing][9]:
|
||||
Also related to the 6-bit byte: a lot of mainframes used a 36-bit word size. Why? Someone pointed out that there’s a great explanation in the Wikipedia article on [36-bit computing][9]:
|
||||
|
||||
> Prior to the introduction of computers, the state of the art in precision
|
||||
> scientific and engineering calculation was the ten-digit, electrically powered,
|
||||
@ -111,23 +83,16 @@ that there’s a great explanation in the Wikipedia article on [36-bit computing
|
||||
|
||||
So this 36 bit thing seems to based on the fact that log_2(20000000000) is 34.2. Huh.
|
||||
|
||||
My guess is that the reason for this is in the 50s, computers were
|
||||
extremely expensive. So if you wanted your computer to support ten decimal
|
||||
digits, you’d design so that it had exactly enough bits to do that, and no
|
||||
more.
|
||||
My guess is that the reason for this is in the 50s, computers were extremely expensive. So if you wanted your computer to support ten decimal
|
||||
digits, you’d design so that it had exactly enough bits to do that, and no more.
|
||||
|
||||
Today computers are way faster and cheaper, so if you want to represent ten
|
||||
decimal digits for some reason you can just use 64 bits – wasting a little bit
|
||||
of space is usually no big deal.
|
||||
Today computers are way faster and cheaper, so if you want to represent ten decimal digits for some reason you can just use 64 bits – wasting a little bit of space is usually no big deal.
|
||||
|
||||
Someone else mentioned that some of these machines with 36-bit word sizes let
|
||||
you choose a byte size – you could use 5 or 6 or 7 or 8-bit bytes, depending
|
||||
on the context.
|
||||
Someone else mentioned that some of these machines with 36-bit word sizes let you choose a byte size – you could use 5 or 6 or 7 or 8-bit bytes, depending on the context.
|
||||
|
||||
#### reason 2: to work well with binary-coded decimal
|
||||
|
||||
In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that
|
||||
encoded every decimal digit in 4 bits.
|
||||
In the 60s, there was a popular integer encoding called binary-coded decimal (or [BCD][10] for short) that encoded every decimal digit in 4 bits.
|
||||
|
||||
For example, if you wanted to encode the number 1234, in BCD that would be something like:
|
||||
|
||||
@ -135,49 +100,32 @@ For example, if you wanted to encode the number 1234, in BCD that would be somet
|
||||
0001 0010 0011 0100
|
||||
```
|
||||
|
||||
So if you want to be able to easily work with binary-coded decimal, your byte
|
||||
size should be a multiple of 4 bits, like 8 bits!
|
||||
So if you want to be able to easily work with binary-coded decimal, your byte size should be a multiple of 4 bits, like 8 bits!
|
||||
|
||||
#### why was BCD popular?
|
||||
|
||||
This integer representation seemed really weird to me – why not just use
|
||||
binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
|
||||
This integer representation seemed really weird to me – why not just use binary, which is a much more efficient way to store integers? Efficiency was really important in early computers!
|
||||
|
||||
My best guess about why is that early computers didn’t have displays the same way we do
|
||||
now, so the contents of a byte were mapped directly to on/off lights.
|
||||
My best guess about why is that early computers didn’t have displays the same way we do now, so the contents of a byte were mapped directly to on/off lights.
|
||||
|
||||
Here’s a [picture from Wikipedia of an IBM 650 with some lights on its display][11] ([CC BY-SA 3.0][12]):
|
||||
|
||||
![][13]
|
||||
|
||||
So if you want people to be relatively able to easily read off a decimal number
|
||||
from its binary representation, this makes a lot more sense. I think today BCD
|
||||
is obsolete because we have displays and our computers can convert numbers
|
||||
represented in binary to decimal for us and display them.
|
||||
So if you want people to be relatively able to easily read off a decimal number from its binary representation, this makes a lot more sense. I think today BCD is obsolete because we have displays and our computers can convert numbers represented in binary to decimal for us and display them.
|
||||
|
||||
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from – in
|
||||
the context of BCD, you end up referring to half bytes a lot (because every
|
||||
digits is 4 bits). So it makes sense to have a word for “4 bits”, and people
|
||||
called 4 bits a nibble. Today “nibble” feels to me like an archaic term though –
|
||||
I’ve definitely never used it except as a fun fact (it’s such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
|
||||
Also, I wonder if BCD is where the term “nibble” for 4 bits comes from – in the context of BCD, you end up referring to half bytes a lot (because every digits is 4 bits). So it makes sense to have a word for “4 bits”, and people called 4 bits a nibble. Today “nibble” feels to me like an archaic term though – I’ve definitely never used it except as a fun fact (it’s such a fun word!). The Wikipedia article on [nibbles][14] supports this theory:
|
||||
|
||||
> The nibble is used to describe the amount of memory used to store a digit of
|
||||
> a number stored in packed decimal format (BCD) within an IBM mainframe.
|
||||
|
||||
Another reason someone mentioned for BCD was **financial calculations**. Today
|
||||
if you want to store a dollar amount, you’ll typically just use an integer
|
||||
amount of cents, and then divide by 100 if you want the dollar part. This is no
|
||||
big deal, division is fast. But apparently in the 70s dividing an integer
|
||||
represented in binary by 100 was very slow, so it was worth it to redesign how
|
||||
you represent your integers to avoid having to divide by 100.
|
||||
Another reason someone mentioned for BCD was **financial calculations**. Today if you want to store a dollar amount, you’ll typically just use an integer amount of cents, and then divide by 100 if you want the dollar part. This is no big deal, division is fast. But apparently in the 70s dividing an integer represented in binary by 100 was very slow, so it was worth it to redesign how you represent your integers to avoid having to divide by 100.
|
||||
|
||||
Okay, enough about BCD.
|
||||
|
||||
#### reason 3: 8 is a power of 2?
|
||||
|
||||
A bunch of people said it’s important for a CPU’s byte size to be a power of 2.
|
||||
I can’t figure out whether this is true or not though, and I wasn’t satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper.
|
||||
And historically there have definitely been lots of machines that used byte sizes that weren’t powers of 2, for example (from [this retro computing stack exchange thread][15]):
|
||||
A bunch of people said it’s important for a CPU’s byte size to be a power of 2. I can’t figure out whether this is true or not though, and I wasn’t satisfied with the explanation that “computers use binary so powers of 2 are good”. That seems very plausible but I wanted to dig deeper. And historically there have definitely been lots of machines that used byte sizes that weren’t powers of 2, for example (from [this retro computing stack exchange thread][15]):
|
||||
|
||||
- Cyber 180 mainframes used 6-bit bytes
|
||||
- the Univac 1100 / 2200 series used a 36-bit word size
|
||||
@ -190,57 +138,31 @@ Some reasons I heard for why powers of 2 are good that I haven’t understood ye
|
||||
|
||||
Reasons that made more sense to me:
|
||||
|
||||
- it makes it easier to design **clock dividers** that can measure “8 bits were
|
||||
sent on this wire” that work based on halving – you can put 3 halving clock
|
||||
dividers in series. [Graham Sutherland][16] told me about this and made this really cool
|
||||
[simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
|
||||
- if you have an instruction that zeroes out a specific bit in a byte, then if
|
||||
your byte size is 8 (2^3), you can use just 3 bits of your instruction to
|
||||
indicate which bit. x86 doesn’t seem to do this, but the [Z80’s bit testing instructions][18] do.
|
||||
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work
|
||||
in groups of 4 bits. From some quick Googling it seems like there are a wide
|
||||
variety of adder circuits out there though.
|
||||
- **bitmaps**: Your computer’s memory is organized into pages (usually of size 2^n). It
|
||||
needs to keep track of whether every page is free or not. Operating systems
|
||||
use a bitmap to do this, where each bit corresponds to a page and is 0 or 1
|
||||
depending on whether the page is free. If you had a 9-bit byte, you would
|
||||
need to divide by 9 to find the page you’re looking for in the bitmap.
|
||||
Dividing by 9 is slower than dividing by 8, because dividing by powers of 2
|
||||
is always the fastest thing.
|
||||
- it makes it easier to design **clock dividers** that can measure “8 bits were sent on this wire” that work based on halving – you can put 3 halving clock dividers in series. [Graham Sutherland][16] told me about this and made this really cool [simulator of clock dividers][17] showing what these clock dividers look like. That site (Falstad) also has a bunch of other example circuits and it seems like a really cool way to make circuit simulators.
|
||||
- if you have an instruction that zeroes out a specific bit in a byte, then if your byte size is 8 (2^3), you can use just 3 bits of your instruction to indicate which bit. x86 doesn’t seem to do this, but the [Z80’s bit testing instructions][18] do.
|
||||
- someone mentioned that some processors use [Carry-lookahead adders][19], and they work in groups of 4 bits. From some quick Googling it seems like there are a wide variety of adder circuits out there though.
|
||||
- **bitmaps**: Your computer’s memory is organized into pages (usually of size 2^n). It needs to keep track of whether every page is free or not. Operating systems use a bitmap to do this, where each bit corresponds to a page and is 0 or 1 depending on whether the page is free. If you had a 9-bit byte, you would need to divide by 9 to find the page you’re looking for in the bitmap. Dividing by 9 is slower than dividing by 8, because dividing by powers of 2 is always the fastest thing.
|
||||
|
||||
I probably mangled some of those explanations pretty badly: I’m pretty far out
|
||||
of my comfort zone here. Let’s move on.
|
||||
I probably mangled some of those explanations pretty badly: I’m pretty far out of my comfort zone here. Let’s move on.
|
||||
|
||||
#### reason 4: small byte sizes are good
|
||||
|
||||
You might be wondering – well, if 8-bit bytes were better than 4-bit bytes,
|
||||
why not keep increasing the byte size? We could have 16-bit bytes!
|
||||
You might be wondering – well, if 8-bit bytes were better than 4-bit bytes, why not keep increasing the byte size? We could have 16-bit bytes!
|
||||
|
||||
A couple of reasons to keep byte sizes small:
|
||||
|
||||
- It’s a waste of space – a byte is the minimum unit you can address, and if
|
||||
your computer is storing a lot of ASCII text (which only needs 7 bits), it
|
||||
would be a pretty big waste to dedicate 12 or 16 bits to each character when
|
||||
you could use 8 bits instead.
|
||||
- It’s a waste of space – a byte is the minimum unit you can address, and if your computer is storing a lot of ASCII text (which only needs 7 bits), it would be a pretty big waste to dedicate 12 or 16 bits to each character when you could use 8 bits instead.
|
||||
- As bytes get bigger, your CPU needs to get more complex. For example you need one bus line per bit. So I guess simpler is better.
|
||||
|
||||
My understanding of CPU architecture is extremely shaky so I’ll leave it at
|
||||
that. The “it’s a waste of space” reason feels pretty compelling to me though.
|
||||
My understanding of CPU architecture is extremely shaky so I’ll leave it at that. The “it’s a waste of space” reason feels pretty compelling to me though.
|
||||
|
||||
#### reason 5: compatibility
|
||||
|
||||
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the
|
||||
8086 (from 1976) – the first x86 processor. It seems like the 8080 and the
|
||||
8086 were really popular and that’s where we get our modern x86 computers.
|
||||
The Intel 8008 (from 1972) was the precursor to the 8080 (from 1974), which was the precursor to the 8086 (from 1976) – the first x86 processor. It seems like the 8080 and the 8086 were really popular and that’s where we get our modern x86 computers.
|
||||
|
||||
I think there’s an “if it ain’t broke don’t fix it” thing going on here – I
|
||||
assume that 8-bit bytes were working well, so Intel saw no need to change the
|
||||
design. If you keep the same 8-bit byte, then you can reuse more of your
|
||||
instruction set.
|
||||
I think there’s an “if it ain’t broke don’t fix it” thing going on here – I assume that 8-bit bytes were working well, so Intel saw no need to change the design. If you keep the same 8-bit byte, then you can reuse more of your instruction set.
|
||||
|
||||
Also around the 80s we start getting network protocols like TCP
|
||||
which use 8-bit bytes (usually called “octets”), and if you’re going to be
|
||||
implementing network protocols, you probably want to be using an 8-bit byte.
|
||||
Also around the 80s we start getting network protocols like TCP which use 8-bit bytes (usually called “octets”), and if you’re going to be implementing network protocols, you probably want to be using an 8-bit byte.
|
||||
|
||||
#### that’s all!
|
||||
|
||||
@ -253,29 +175,15 @@ It seems to me like the main reasons for the 8-bit byte are:
|
||||
- 8 is a better number than 7 (because it’s a power of 2)
|
||||
- once you have popular 8-bit computers that are working well, you want to keep the same design for compatibility
|
||||
|
||||
Someone pointed out that [page 65 of this book from 1962][20]
|
||||
talking about IBM’s reasons to choose an 8-bit byte basically says the same thing:
|
||||
Someone pointed out that [page 65 of this book from 1962][20] talking about IBM’s reasons to choose an 8-bit byte basically says the same thing:
|
||||
|
||||
- Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
|
||||
- Within the limits of this capacity, a single character is represented by a
|
||||
single byte, so that the length of any particular record is not dependent on
|
||||
the coincidence of characters in that record.
|
||||
- 8-bit bytes are reasonably economical of storage space
|
||||
- For purely numerical work, a decimal digit can be represented by only 4
|
||||
bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such
|
||||
packing of numerical data is not essential, it is a common practice in
|
||||
order to increase speed and storage efficiency. Strictly speaking, 4-bit
|
||||
bytes belong to a different code, but the simplicity of the 4-and-8-bit
|
||||
scheme, as compared with a combination 4-and-6-bit scheme, for example,
|
||||
leads to simpler machine design and cleaner addressing logic.
|
||||
- Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer
|
||||
to take advantage of powerful features of binary addressing and indexing to
|
||||
the bit level (see Chaps. 4 and 5 ) .
|
||||
> 1. Its full capacity of 256 characters was considered to be sufficient for the great majority of applications.
|
||||
> 2. Within the limits of this capacity, a single character is represented by a single byte, so that the length of any particular record is not dependent on the coincidence of characters in that record.
|
||||
> 3. 8-bit bytes are reasonably economical of storage space
|
||||
> 4. For purely numerical work, a decimal digit can be represented by only 4 bits, and two such 4-bit bytes can be packed in an 8-bit byte. Although such packing of numerical data is not essential, it is a common practice in order to increase speed and storage efficiency. Strictly speaking, 4-bit bytes belong to a different code, but the simplicity of the 4-and-8-bit scheme, as compared with a combination 4-and-6-bit scheme, for example, leads to simpler machine design and cleaner addressing logic.
|
||||
> 5. Byte sizes of 4 and 8 bits, being powers of 2, permit the computer designer to take advantage of powerful features of binary addressing and indexing to the bit level (see Chaps. 4 and 5 ) .
|
||||
|
||||
>
|
||||
|
||||
Overall this makes me feel like an 8-bit byte is a pretty natural choice if
|
||||
you’re designing a binary computer in an English-speaking country.
|
||||
Overall this makes me feel like an 8-bit byte is a pretty natural choice if you’re designing a binary computer in an English-speaking country.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user