TranslateProject/sources/talk/20200127 Some Useful Probability Facts for Systems Programming.md
DarkSun 7e292b0335 选题: 20200127 Some Useful Probability Facts for Systems Programming
sources/talk/20200127 Some Useful Probability Facts for Systems Programming.md
2020-01-29 01:18:46 +08:00

15 KiB
Raw Blame History

Some Useful Probability Facts for Systems Programming

Probability problems come up a lot in systems programming, and Im using that term loosely to mean everything from operating systems programming and networking, to building large online services, to creating virtual worlds like in games. Heres a bunch of rough-and-ready probability rules of thumb that are deeply related and have many practical applications when designing systems.

(\frac{1}{N}) chance tried (N) times

Retries are everywhere in systems programming. For example, imagine youre designing a world for a roleplaying game. When monsters in one particular forest are defeated, they have a small chance of dropping some special item. Suppose that chance is (\frac{1}{10}). Basic probability says that players are expected to need 10 victories before gaining a special item. But in probability, “expected value” is just a kind of average value, and theres no guarantee that players will get the item even if they do win 10 battles.

Whats a players chance of getting at least one special item after 10 battles? Lets just try getting a computer to calculate the probability of getting at least one success for (N) tries at a (\frac{1}{N}) chance, for a range of values of (N):

Plot of probability of at least one success after trying a 1/N chance N times. The probability starts at 100% and drops, but quickly flattens out to a value just below 65%.

Eyeballing the graph, it looks like the probability is roughly constant for (N) greater than about 3. In fact, it converges on (1 - e^{- 1}), where (e) is Eulers constant, 2.718281828459… This number (along with (e^{- 1})) is extremely common in engineering, so heres a table of practical values:

exp(-1.0) 1.0-exp(-1.0)
36.7% 63.2%
A bit over a third A bit under two thirds

So, if you dont need an exact answer (and you often dont) you could just say that if the drop probability is (\frac{1}{10}), most players will get a special item within 10 battles, but (close enough) about a third wont. Because the monster battles are statistically independent, we can make a rough guess that about 1 in 9 players still wont have a special item after 20 battles. Also, roughly 60% wont after 5 battles because (0.6 \times 0.6 = 0.36), so 60% is close to the square root of 36.7%.

If you have (\frac{1}{N}) chance of success, then youre more likely than not to have succeeded after (N) tries, but the probability is only about two thirds (or your favourite approximation to (1 - e^{- 1})). The value of (N) doesnt affect the approximation much as long as its at least 3.

Heres the proof: The chance of the player failing to get a special item after all (10) battles is ((\frac{9}{10})^{10} = .349), so the chance of getting at least one is (1 - .349 = .651). More generally, the chance of succeeding at least once is (1 - (1 - \frac{1}{N})^{N}), which converges on (1 - e^{- 1}) (by one of the definitions of (e)).

By the way, this rule is handy for quick mental estimates in board games, too. Suppose a player needs to get at least one 6 from rolling 8 six-sided dice. Whats probability of failure? Its about (\frac{1}{3}) for the first 6 dice, and (\frac{5}{6} \times \frac{5}{6}) for the remaining two, so all up its about (\frac{1}{3} \times \frac{25}{36} \approx \frac{8}{36} \approx \frac{1}{4}). A calculator says ((\frac{5}{6})^{8} = .233), so the rough approximation was good enough for gameplay.

(N) balls in (N) buckets

Ill state this one up front:

Suppose you throw (N) balls randomly (independently and one at a time) into (N) buckets. On average, a bit over a third ((e^{- 1})) of the buckets will stay empty, a bit over a third ((e^{- 1}) again) will have exactly one ball, and the remaining quarter or so ((1 - 2e^{- 1})) will contain multiple balls.

The balls-and-buckets abstract model has plenty of concrete engineering applications. For example, suppose you have a load balancer randomly assigning requests to 12 backends. If 12 requests come in during some time window, on average about 4 of the backends will be idle, only about 4 will have a balanced single-request load, and the remaining (average) 3 or 4 will be under higher load. Of course, all of these are averages, and therell be fluctuations in practice.

As another example, if a hash table has (N) slots, then if you put (N) different values into it, about a third of the slots will still be empty and about a quarter of the slots will have collisions.

If an online service has a production incident once every 20 days on average, then (assuming unrelated incidents) just over a third of 20-day periods will be dead quiet, just over a third will have the “ideal” single incident, while a quarter of 20-day periods will be extra stressful. In the real world, production incidents are even more tightly clustered because theyre not always independent.

This rule of thumb also hints at why horizontally scaled databases tend to have hot and cold shards, and why low-volume businesses (like consulting) can suffer from feast/famine patterns of customer demand.

Random allocation is much more unbalanced than we intuitively expect. A famous example comes from World War II when, late in the war, the Germans launched thousands of V-1 and V-2 flying bombs at London. Hitting a city with a rocket from across the Channel already required pushing 1940s technology to new limits, but several British analysts looked at maps of bomb damage and concluded that the Germans were somehow targetting specific areas of London, implying an incredible level of technological advancement. In 1946, however, an actuary did the proper statistical analysis and said that, no, the clustering of bomb damage was simply what youd expect from random chance. (That analysis is based on the Poisson distribution, and the ratios in the rule for (N) balls and (N) buckets can be calculated from a Poisson distribution with (\lambda = \frac{N}{N} = 1).)

25 points uniformly randomly placed on a 5x5 grid, showing spurious clustering. 8 boxes are empty, 10 boxes contain one point and 7 boxes contain two or more points.

Random allocation only balances out when the number of “balls” is much larger than the number of “buckets”, i.e., when averaging over a large number of items, or a long time period. Thats one of the many reasons that engineering solutions that work well for large-scale FAANG companies can be problematic when used by companies that are orders of magnitude smaller.

Proving the third-third-quarter rule is pretty easy if you look at just one bucket. Each of the (N) balls represents a (\frac{1}{N}) chance of adding a ball to the bucket, so the chance of the bucket staying empty is just the (e^{- 1} \approx 36.7%) from the first rule. Linearity of expectation means we can combine the results for each bucket and say that 36.7% of all buckets are expected to be empty, even though the bucket counts arent independent. Also, there are (N) possible ways of exactly one ball landing in the bucket, and each way requires one ball to fall in (with probability (\frac{1}{N})) and the other (N - 1) balls to miss (with probability (1 - \frac{1}{N})). So the probably of exactly one ball falling in is (\left. N \times \frac{1}{N} \times (1 - \frac{1}{N})^{N - 1}\rightarrow e^{- 1} \right.).

Fixed points of random permutations

I dont think this rule has as many practical applications as the (N) balls/buckets rule, but its kind of a freebie.

Think of a battle game in which 6 players start from 6 spawn/home points. If the players play a second round, whats the chance that someone starts from the same point? Mathematically, thats asking about the chance of a random permutation having a “fixed point”.

If a group of things are randomly shuffled, a bit of over a third ((e^{- 1})) of the time therell be no fixed points, a bit of over a third ((e^{- 1})) of the time therell be just one fixed point, and the remaining quarter or so of the time therell be two or more.

The number of fixed points in a random shuffle happens to approximate the same distribution as the number of balls in the buckets before, which can be proven from first principles using the inclusion-exclusion principle. But theres an even simpler proof for a related fact:

A random permutation has exactly one fixed point on average, regardless of size.

If there are (N) things, each one has a (\frac{1}{N}) chance of ending up in its original location after the shuffle, so on average therell be (N \times \frac{1}{N} = 1) fixed points. Note that its impossible to get exactly one fixed point by shuffling a two element set (try it!) but 1 is still the average of 2 and 0. (“Average” doesnt always mean what we want it to mean.)

That proof might seem too simple, but its a demonstration of how powerful linearity of expectation is. Trying to calculate statistics for permutations can be tricky because the places any item can go depend on the places all the other items have gone. Linearity of expectation means we dont have to care about all the interactions as long as we only need to know the average. The average isnt always the most useful statistic to calculate, but its often the easiest by far.

The coupon collectors problem

Lets look at the common “loot box” mechanism. Specifically, suppose there are 10 collector items (say, one for each hero in a franchise) that are sold in blind boxes. Lets take the fairest case in which there are no rare items and each item has an equal (\frac{1}{10}) chance of being in a given box. How many boxes will a collector buy on average before getting a complete set? This is the called the coupon collectors problem, and for 10 items the answer is about 29.

The answer to the coupon collectors problem is a bit more than (N\ln N) (add (\frac{N}{2}) for some more accuracy).

((\ln N) is (\log) base (e), or just log(N) in most programming languages.)

The coupon collectors problem hints at why the loot box mechanism is so powerful. The (N) balls in (N) buckets rule tells us that the collector will have about two thirds of the items after buying 10 boxes. It feels like the collector is most of the way there, and it would be a shame to give up and let so much progress go to waste, but actually 10 boxes is only about a third of the expected number of boxes needed. Thats a simplistic model, but item rarity, variation of box type and (in computer games) making some items “unlockable” by completing sets of other items (or fulfilling other dependencies) only make it easier to get collectors to buy more than they originally expect.

The (N\ln N) rule is very rough, so heres a plot for comparison:

Plot of approximations to the coupon collector's problem. N ln N underestimates significantly, but has the right growth rate. N ln N + N/2 still underestimates slightly, but the error is less than 10%. The 1:1 slope N is also included to show that, beyond small values of N, multiple times N purchases are needed to get all items on average.

The exact value is rarely needed, but its useful to know that youll quickly need multiple times (N) trials to get all (N) hits. Any application of the (N) balls/buckets rule naturally extends to a coupon collectors problem (e.g., on average youll need to put over (N\ln N) items into a hash table before all (N) slots are full) but the coupon collectors problem comes up in other places, too. Often its tempting to use randomness to solve a problem statelessly, and then you find yourself doing a coupon collector problem. A cool example is the FizzleFade effect in the classic 90s first-person shooter Wolfenstein 3D. When the player character died, the screen would fill up with red pixels in what looks like random order. A simple and obvious way to implement that would be to plot red pixels at random coordinates in a loop, but filling the screen that way would be boring. With (320 \times 200 = 64000) pixels, most (~63.2%) of the screen would be filled red after 64000 iterations, but then the player would have to wait over (\ln(64000) \approx 11) times longer than that watching the last patches of screen fade away. The developers of Wolfenstein had to come up with a way to calculate a pseudo-random permutation of pixels on the screen, without explicitly storing the permutation in memory.

Heres a loose explanation of where the (\ln N) factor comes from: We know already that any pixel has approximately (\frac{1}{e}) chance of not being coloured by any batch of (N) pixel plots. So, after a batch of (N) pixel plots, the number of unfilled pixels goes down by a factor of (e) on average. If we assume we can multiply the average because its close enough to the geometric mean, the number of unfilled pixels will drop from (N) to something like (\frac{N}{e^{k}}) after (k) batches. That means the number of batches needed to go from (N) unfilled pixels to 1 is something like (\ln N), from the basic definition of logarithms.

In the computer age its easy to get an answer once we know we have a specific probability problem to solve. But rough rules like the ones in this post are still useful during the design phase, or for getting an intuitive understanding for why a system behaves the way it does.


via: https://theartofmachinery.com/2020/01/27/systems_programming_probability.html

作者:Simon Arneaud 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出