fix formats for 20230113.0

This commit is contained in:
Edward Liu 2023-04-08 12:16:52 +08:00
parent ed74d71b7c
commit 54f6a28e7a

View File

@ -24,16 +24,20 @@ Ive heard a million times about the dangers of floating point arithmetic, lik
But I find all of this a little abstract on its own, and I really wanted some But I find all of this a little abstract on its own, and I really wanted some
specific examples of floating point bugs in real-world programs. specific examples of floating point bugs in real-world programs.
So I [asked on Mastodon][1] for So I [asked on Mastodon][1] for examples of how floating point has gone wrong for them in real programs, and as always folks delivered! Here are a bunch of examples. Ive also written some example programs for some of them to see exactly what happens. Heres a table of contents:
examples of how floating point has gone wrong for them in real programs, and as
always folks delivered! Here are a bunch of examples. Ive also written some
example programs for some of them to see exactly what happens. Heres a table of contents:
[how does floating point work?][2][floating point isnt “bad” or random][3][example 1: the odometer that stopped][4][example 2: tweet IDs in Javascript][5][example 3: a variance calculation gone wrong][6][example 4: different languages sometimes do the same floating point calculation differently][7][example 5: the deep space kraken][8][example 6: the inaccurate timestamp][9][example 7: splitting a page into columns][10][example 8: collision checking][11] - [how does floating point work?][2]
- [floating point isnt “bad” or random][3]
- [example 1: the odometer that stopped][4]
- [example 2: tweet IDs in Javascript][5]
- [example 3: a variance calculation gone wrong][6]
- [example 4: different languages sometimes do the same floating point calculation differently][7]
- [example 5: the deep space kraken][8]
- [example 6: the inaccurate timestamp][9]
- [example 7: splitting a page into columns][10]
- [example 8: collision checking][11]
None of these 8 examples talk about NaNs or +0/-0 or infinity values or None of these 8 examples talk about NaNs or +0/-0 or infinity values or subnormals, but its not because those things dont cause problems its just that I got tired of writing at some point :).
subnormals, but its not because those things dont cause problems its just
that I got tired of writing at some point :).
Also Ive probably made some mistakes in this post. Also Ive probably made some mistakes in this post.
@ -45,35 +49,21 @@ Im not going to write a long explanation of how floating point works in this
#### floating point isnt “bad” or random #### floating point isnt “bad” or random
I dont want you to read this post and conclude that floating point is bad. I dont want you to read this post and conclude that floating point is bad. Its an amazing tool for doing numerical calculations. So many smart people have done so much work to make numerical calculations on computers efficient and accurate! Two points about how all of this isnt floating points fault:
Its an amazing tool for doing numerical calculations. So many smart people
have done so much work to make numerical calculations on computers efficient and
accurate! Two points about how all of this isnt floating points fault:
- Doing numerical computations on a computer inherently involves - Doing numerical computations on a computer inherently involves some approximation and rounding, especially if you want to do it efficiently. You cant always store an arbitrary amount of precision for
some approximation and rounding, especially if you want to do it
efficiently. You cant always store an arbitrary amount of precision for
every single number youre working with. every single number youre working with.
- Floating point is standardized (IEEE 754), so operations like addition on - Floating point is standardized (IEEE 754), so operations like addition on floating point numbers are deterministic my understanding is that 0.1 + 0.2 will always give you the exact same result (0.30000000000000004), even across different architectures. It might not be the result you _expected_, but its actually very predictable.
floating point numbers are deterministic my understanding is that 0.1 +
0.2 will always give you the exact same result (0.30000000000000004), even
across different architectures. It might not be the result you _expected_,
but its actually very predictable.
My goal for this post is just to explain what kind of problems can come up with My goal for this post is just to explain what kind of problems can come up with floating point numbers and why they happen so that you know when to be careful with them, and when theyre not appropriate.
floating point numbers and why they happen so that you know when to be
careful with them, and when theyre not appropriate.
Now lets get into the examples. Now lets get into the examples.
#### example 1: the odometer that stopped #### example 1: the odometer that stopped
One person said that they were working on an odometer that was continuously One person said that they were working on an odometer that was continuously adding small amounts to a 32-bit float to measure distance travelled, and things went very wrong.
adding small amounts to a 32-bit float to measure distance travelled, and
things went very wrong.
To make this concrete, lets say that were adding numbers to the odometer 1cm To make this concrete, lets say that were adding numbers to the odometer 1cm at a time. What does it look like after 10,000 kilometers?
at a time. What does it look like after 10,000 kilometers?
Heres a C program that simulates that: Heres a C program that simulates that:
@ -101,10 +91,7 @@ This is VERY bad its not a small error, 262km is a LOT less than 10,000km
#### what went wrong: gaps between floating point numbers get big #### what went wrong: gaps between floating point numbers get big
The problem in this case is that, for 32-bit floats, 262144.0 + 0.01 = 262144.0. The problem in this case is that, for 32-bit floats, 262144.0 + 0.01 = 262144.0. So its not just that the number is inaccurate, itll actually never increase at all! If we travelled another 10,000 kilometers, the odometer would still be stuck at 262144 meters (aka 262.144km).
So its not just that the number is inaccurate, itll actually never increase
at all! If we travelled another 10,000 kilometers, the odometer would still be
stuck at 262144 meters (aka 262.144km).
Why is this happening? Well, floating point numbers get farther apart as they get bigger. In this example, for 32-bit floats, here are 3 consecutive floating point numbers: Why is this happening? Well, floating point numbers get farther apart as they get bigger. In this example, for 32-bit floats, here are 3 consecutive floating point numbers:
@ -116,13 +103,9 @@ I got those numbers by going to [https://float.exposed/0x48800000][13] and incre
So, there are no 32-bit floating point numbers between 262144.0 and 262144.03125. Why is that a problem? So, there are no 32-bit floating point numbers between 262144.0 and 262144.03125. Why is that a problem?
The problem is that 262144.03125 is about 262144.0 + 0.03. So when we try to The problem is that 262144.03125 is about 262144.0 + 0.03. So when we try to add 0.01 to 262144.0, it doesnt make sense to round up to the next number. So the sum just stays at 262144.0.
add 0.01 to 262144.0, it doesnt make sense to round up to the next number. So
the sum just stays at 262144.0.
Also, its not a coincidence that 262144 is a power of 2 (its 2^18). The gaps Also, its not a coincidence that 262144 is a power of 2 (its 2^18). The gaps been floating point numbers change after every power of 2, and at 2^18 the gap between 32-bit floats is 0.03125, increasing from 0.016ish.
been floating point numbers change after every power of 2, and at 2^18 the gap
between 32-bit floats is 0.03125, increasing from 0.016ish.
#### one way to solve this: use a double #### one way to solve this: use a double
@ -133,41 +116,26 @@ Expected: 10000.000000 km
Got: 9999.999825 km Got: 9999.999825 km
``` ```
There are still some small inaccuracies here were off about 17 centimeters. There are still some small inaccuracies here were off about 17 centimeters. Whether this matters or not depends on the context: being slightly off could very well be disastrous if we were doing a precision space maneuver or something, but its probably fine for an odometer.
Whether this matters or not depends on the context: being slightly off could very
well be disastrous if we were doing a precision space maneuver or something, but
its probably fine for an odometer.
Another way to improve this would be to increment the odometer in bigger chunks Another way to improve this would be to increment the odometer in bigger chunks instead of adding 1cm at a time, maybe we could update it less frequently, like every 50cm.
instead of adding 1cm at a time, maybe we could update it less frequently,
like every 50cm.
If we use a double **and** increment by 50cm instead of 1cm, we get the exact If we use a double **and** increment by 50cm instead of 1cm, we get the exact correct answer:
correct answer:
``` ```
Expected: 10000.000000 km Expected: 10000.000000 km
Got: 10000.000000 km Got: 10000.000000 km
``` ```
A third way to solve this could be to use an **integer**: maybe we decide that A third way to solve this could be to use an **integer**: maybe we decide that the smallest unit we care about is 0.1mm, and then measure everything as integer multiples of 0.1mm. I have never built an odometer so I cant say what the best approach is.
the smallest unit we care about is 0.1mm, and then measure everything as
integer multiples of 0.1mm. I have never built an odometer so I cant say what
the best approach is.
#### example 2: tweet IDs in Javascript #### example 2: tweet IDs in Javascript
Javascript only has floating point numbers it doesnt have an integer type. Javascript only has floating point numbers it doesnt have an integer type. The biggest integer you can represent in a 64-bit floating point number is 2^53.
The biggest integer you can represent in a 64-bit floating point number is
2^53.
But tweet IDs are big numbers, bigger than 2^53. The Twitter API now returns But tweet IDs are big numbers, bigger than 2^53. The Twitter API now returns them as both integers and strings, so that in Javascript you can just use the string ID (like “1612850010110005250”), but if you tried to use the integer version in JS, things would go very wrong.
them as both integers and strings, so that in Javascript you can just use the
string ID (like “1612850010110005250”), but if you tried to use the integer
version in JS, things would go very wrong.
You can check this yourself by taking a tweet ID and putting it in the You can check this yourself by taking a tweet ID and putting it in the Javascript console, like this:
Javascript console, like this:
``` ```
>> 1612850010110005250 >> 1612850010110005250
@ -176,8 +144,7 @@ Javascript console, like this:
Notice that 1612850010110005200 is NOT the same number as 1612850010110005250!! Its 50 less! Notice that 1612850010110005200 is NOT the same number as 1612850010110005250!! Its 50 less!
This particular issue doesnt happen in Python (or any other language that I This particular issue doesnt happen in Python (or any other language that I know of), because Python has integers. Heres what happens if we enter the same number in a Python REPL:
know of), because Python has integers. Heres what happens if we enter the same number in a Python REPL:
``` ```
In [3]: 1612850010110005250 In [3]: 1612850010110005250
@ -188,14 +155,9 @@ Same number, as youd expect.
#### example 2.1: the corrupted JSON data #### example 2.1: the corrupted JSON data
This is a small variant of the “tweet IDs in Javascript” issue, but even if This is a small variant of the “tweet IDs in Javascript” issue, but even if youre _not_ actually writing Javascript code, numbers in JSON are still sometimes treated as if theyre floats. This mostly makes sense to me because JSON has “Javascript” in the name, so it seems reasonable to decode the values the way Javascript would.
youre _not_ actually writing Javascript code, numbers in JSON are still sometimes
treated as if theyre floats. This mostly makes sense to me because JSON has
“Javascript” in the name, so it seems reasonable to decode the values the way
Javascript would.
For example, if we pass some JSON through `jq`, we see the exact same issue: For example, if we pass some JSON through `jq`, we see the exact same issue: the number 1612850010110005250 gets changed into 1612850010110005200.
the number 1612850010110005250 gets changed into 1612850010110005200.
``` ```
$ echo '{"id": 1612850010110005250}' | jq '.' $ echo '{"id": 1612850010110005250}' | jq '.'
@ -206,19 +168,13 @@ $ echo '{"id": 1612850010110005250}' | jq '.'
But its not consistent across all JSON libraries Pythons `json` module will decode `1612850010110005250` as the correct integer. But its not consistent across all JSON libraries Pythons `json` module will decode `1612850010110005250` as the correct integer.
Several people mentioned issues with sending floats in JSON, whether either Several people mentioned issues with sending floats in JSON, whether either they were trying to send a large integer (like a pointer address) in JSON and it got corrupted, or sending smaller floating point values back and forth repeatedly and the value slowly diverging over time.
they were trying to send a large integer (like a pointer address) in JSON and
it got corrupted, or sending smaller floating point values back and forth
repeatedly and the value slowly diverging over time.
#### example 3: a variance calculation gone wrong #### example 3: a variance calculation gone wrong
Lets say youre doing some statistics, and you want to calculate the variance Lets say youre doing some statistics, and you want to calculate the variance of many numbers. Maybe more numbers than you can easily fit in memory, so you want to do it in a single pass.
of many numbers. Maybe more numbers than you can easily fit in memory, so you
want to do it in a single pass.
Theres a simple (but bad!!!) algorithm you can use to calculate the variance in a single pass, Theres a simple (but bad!!!) algorithm you can use to calculate the variance in a single pass, from [this blog post][14]. Heres some Python code:
from [this blog post][14]. Heres some Python code:
``` ```
def calculate_bad_variance(nums): def calculate_bad_variance(nums):
@ -255,50 +211,27 @@ This is extremely bad: not only is the bad variance way off, its NEGATIVE! (t
#### what went wrong: catastrophic cancellation #### what went wrong: catastrophic cancellation
Whats going here is similar to our odometer number problem: the Whats going here is similar to our odometer number problem: the `sum_of_squares` number gets extremely big (about 10^21 or 2^69), and at that point, the gap between consecutive floating point numbers is also very big its 2**46. So we just lose all precision in our calculations.
`sum_of_squares` number gets extremely big (about 10^21 or 2^69), and at that point, the
gap between consecutive floating point numbers is also very big its 2**46.
So we just lose all precision in our calculations.
The term for this problem is “catastrophic cancellation” were subtracting The term for this problem is “catastrophic cancellation” were subtracting two very large floating point numbers which are both going to be pretty far from the correct value of the calculation, so the result of the subtraction is also going to be wrong. [The blog post I mentioned before][14]
two very large floating point numbers which are both going to be pretty far talks about a better algorithm people use to compute variance called Welfords algorithm, which doesnt have the catastrophic cancellation issue.
from the correct value of the calculation, so the result of the subtraction is
also going to be wrong.
[The blog post I mentioned before][14] And of course, the solution for most people is to just use a scientific computing library like Numpy to calculate variance instead of trying to do it yourself :)
talks about a better algorithm people use to compute variance called
Welfords algorithm, which doesnt have the catastrophic cancellation issue.
And of course, the solution for most people is to just use a scientific
computing library like Numpy to calculate variance instead of trying to do it
yourself :)
#### example 4: different languages sometimes do the same floating point calculation differently #### example 4: different languages sometimes do the same floating point calculation differently
A bunch of people mentioned that different platforms will do the same A bunch of people mentioned that different platforms will do the same calculation in different ways. One way this shows up in practice is maybe you have some frontend code and some backend code that do the exact same floating point calculation. But its done slightly differently in Javascript and in PHP, so you users end up seeing discrepancies and getting confused.
calculation in different ways. One way this shows up in practice is maybe
you have some frontend code and some backend code that do the exact same
floating point calculation. But its done slightly differently in Javascript
and in PHP, so you users end up seeing discrepancies and getting confused.
In principle you might think that different implementations should work the In principle you might think that different implementations should work the same way because of the IEEE 754 standard for floating point, but here are a couple of caveats that were mentioned:
same way because of the IEEE 754 standard for floating point, but here are a
couple of caveats that were mentioned:
- math operations in libc (like sin/log) behave differently in different - math operations in libc (like sin/log) behave differently in different implementations. So code using glibc could give you different results than code using musl
implementations. So code using glibc could give you different results than - some x86 instructions can use 80 bit precision for some double operations internally instead of 64 bit precision. [Heres a GitHub issue talking about that][15]
code using musl
- some x86 instructions can use 80 bit precision for some double operations
internally instead of 64 bit precision. [Heres a GitHub issue talking about
that][15]
Im not very sure about these points and I dont have concrete examples I can reproduce. Im not very sure about these points and I dont have concrete examples I can reproduce.
#### example 5: the deep space kraken #### example 5: the deep space kraken
Kerbal Space Program is a space simulation game, and it used to have a bug Kerbal Space Program is a space simulation game, and it used to have a bug called the [Deep Space Kraken][16] where when you moved very fast, your ship would start getting destroyed due to floating point issues. This is similar to the other problems weve talked out involving big floating numbers (like the variance problem), but I wanted to mention it because:
called the [Deep Space Kraken][16] where when
you moved very fast, your ship would start getting destroyed due to floating point issues. This is similar to the other problems weve talked out involving big floating numbers (like the variance problem), but I wanted to mention it because:
- it has a funny name - it has a funny name
- it seems like a very common bug in video games / astrophysics / simulations in general if you have points that are very far from the origin, your math gets messed up - it seems like a very common bug in video games / astrophysics / simulations in general if you have points that are very far from the origin, your math gets messed up
@ -307,32 +240,24 @@ Another example of this is the [Far Lands][17] in Minecraft.
#### example 6: the inaccurate timestamp #### example 6: the inaccurate timestamp
I promise this is the last example of “very large floating numbers can ruin your day”. I promise this is the last example of “very large floating numbers can ruin your day”. But! Just one more! Lets imagine that we try to represent the current Unix epoch in nanoseconds (about 1673580409000000000) as a 64-bit floating point number.
But! Just one more! Lets imagine that we try to represent the current Unix epoch in nanoseconds
(about 1673580409000000000) as a 64-bit floating point number.
This is no good! 1673580409000000000 is about 2^60 (crucially, bigger than 2^53), and the next 64-bit float after it is 1673580409000000256. This is no good! 1673580409000000000 is about 2^60 (crucially, bigger than 2^53), and the next 64-bit float after it is 1673580409000000256.
So this would be a great way to end up with inaccuracies in your time math. Of So this would be a great way to end up with inaccuracies in your time math. Of course, time libraries actually represent times as integers, so this isnt usually a problem. (theres always still the [year 2038 problem][18], but thats not related to floats)
course, time libraries actually represent times as integers, so this isnt
usually a problem. (theres always still the [year 2038 problem][18], but thats not
related to floats)
In general, the lesson here is that sometimes its better to use integers. In general, the lesson here is that sometimes its better to use integers.
#### example 7: splitting a page into columns #### example 7: splitting a page into columns
Now that weve talked about problems with big floating point numbers, lets do Now that weve talked about problems with big floating point numbers, lets do a problem with small floating point numbers.
a problem with small floating point numbers.
Lets say you have a page width, and a column width, and you want to figure out: Lets say you have a page width, and a column width, and you want to figure out:
- how many columns fit on the page - how many columns fit on the page
- how much space is left over - how much space is left over
You might reasonably try `floor(page_width / column_width)` for the first You might reasonably try `floor(page_width / column_width)` for the first question and `page_width % column_width` for the second question. Because that would work just fine with integers!
question and `page_width % column_width` for the second question. Because
that would work just fine with integers!
``` ```
In [5]: math.floor(13.716 / 4.572) In [5]: math.floor(13.716 / 4.572)
@ -344,21 +269,15 @@ Out[6]: 4.571999999999999
This is wrong! The amount of space left is 0! This is wrong! The amount of space left is 0!
A better way to calculate the amount of space left might have been A better way to calculate the amount of space left might have been `13.716 - 3 * 4.572`, which gives us a very small negative number.
`13.716 - 3 * 4.572`, which gives us a very small negative number.
I think the lesson here is to never calculate the same thing in 2 different ways with floats. I think the lesson here is to never calculate the same thing in 2 different ways with floats.
This is a very basic example but I can kind of see how this would create all This is a very basic example but I can kind of see how this would create all kinds of problems if I was doing page layout with floating point numbers, or doing CAD drawings.
kinds of problems if I was doing page layout with floating point numbers, or
doing CAD drawings.
#### example 8: collision checking #### example 8: collision checking
Heres a very silly Python program, that starts a variable at 1000 and Heres a very silly Python program, that starts a variable at 1000 and decrements it until it collides with 0. You can imagine that this is part of a pong game or something, and that `a` is a ball thats supposed to collide with a wall.
decrements it until it collides with 0. You can imagine that this is part of a
pong game or something, and that `a` is a ball thats supposed to collide with
a wall.
``` ```
a = 1000 a = 1000
@ -366,21 +285,15 @@ while a != 0:
a -= 0.001 a -= 0.001
``` ```
You might expect this program to terminate. But it doesnt! `a` is never 0, You might expect this program to terminate. But it doesnt! `a` is never 0, instead it goes from 1.673494676862619e-08 to -0.0009999832650532314.
instead it goes from 1.673494676862619e-08 to -0.0009999832650532314.
The lesson here is that instead of checking for float equality, usually you The lesson here is that instead of checking for float equality, usually you want to check if two numbers are different by some very small amount. Or here we could just write `while a > 0`.
want to check if two numbers are different by some very small amount. Or here
we could just write `while a > 0`.
#### thats all for now #### thats all for now
I didnt even get to NaNs (the are so many of them!) or infinity or +0 / -0 or subnormals, but weve I didnt even get to NaNs (the are so many of them!) or infinity or +0 / -0 or subnormals, but weve already written 2000 words and Im going to just publish this.
already written 2000 words and Im going to just publish this.
I might write another followup post later that Mastodon thread has literally I might write another followup post later that Mastodon thread has literally 15,000 words of floating point problems in it, theres a lot of material! Or I might not, who knows :)
15,000 words of floating point problems in it, theres a lot of material! Or I
might not, who knows :)
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------