Merge pull request #19503 from silentdawn-zz/patch-3

完成翻译
This commit is contained in:
Xingyu.Wang 2020-09-06 10:26:48 +08:00 committed by GitHub
commit fea92e8a5e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 115 additions and 115 deletions

View File

@ -1,115 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (silentdawn-zz)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Why Sorting is O(N log N))
[#]: via: (https://theartofmachinery.com/2019/01/05/sorting_is_nlogn.html)
[#]: author: (Simon Arneaud https://theartofmachinery.com)
Why Sorting is O(N log N)
======
Any decent algorithms textbook will explain how fast sorting algorithms like quicksort and heapsort are, but it doesnt take crazy maths to prove that theyre as asymptotically fast as you can possibly get.
### A pedantic note about notation
Most computer scientists use big-O notation to mean “asymptotically equal, up to a constant scaling factor”, which isnt quite what it means to other mathematicians. Sorry, Ill use big-O like in CS textbooks, but at least I wont mix it with other mathematical notation.
## Comparison-based sorting
Lets look at the special case of algorithms that compare values two at a time (like quicksort and heapsort, and most other popular algorithms). The ideas can be extended to all sorting algorithms later.
### A simple counting argument for the worst case
Suppose you have an array of four elements, all different, in random order. Can you sort it by comparing just one pair of elements? Obviously not, but heres one good reason that proves you cant: By definition, to sort the array, you need to how to rearrange the elements to put them in order. In other words, you need to know which permutation is needed. How many possible permutations are there? The first element could be moved to one of four places, the second one could go to one of the remaining three, the third element has two options, and the last element has to take the one remaining place. So there are (4 \times 3 \times 2 \times 1 = 4! = 24) possible permutations to choose from, but there are only two possible results from comparing two different things: “BIGGER” and “SMALLER”. If you made a list of all the possible permutations, you might decide that “BIGGER” means you need permutation #8 and “SMALLER” means you need permutation #24, but theres no way you could know when you need the other 22 permutations.
With two comparisons, you have (2 \times 2 = 4) possible outputs, which still isnt enough. You cant sort every possible shuffled array unless you do at least five comparisons ((2^{5} = 32)). If (W(N)) is the worst-case number of comparisons needed to sort (N) different elements using some algorithm, we can say
[2^{W(N)} \geq N!]
Taking a logarithm base 2,
[W(N) \geq \log_{2}{N!}]
Asymptotically, (N!) grows like (N^{N}) (see also [Stirlings formula][1]), so
[W(N) \succeq \log N^{N} = N\log N]
And thats an (O(N\log N)) limit on the worst case just from counting outputs.
### Average case from information theory
We can get a stronger result if we extend that counting argument with a little information theory. Heres how we could use a sorting algorithm as a code for transmitting information:
1. I think of a number — say, 15
2. I look up permutation #15 from the list of permutations of four elements
3. I run the sorting algorithm on this permutation and record all the “BIGGER” and “SMALLER” comparison results
4. I transmit the comparison results to you in binary code
5. You re-enact my sorting algorithm run, step by step, referring to my list of comparison results as needed
6. Now that you know how I rearranged my array to make it sorted, you can reverse the permutation to figure out my original array
7. You look up my original array in the permutation list to figure out I transmitted the number 15
Okay, its a bit strange, but it could be done. That means that sorting algorithms are bound by the same laws as normal encoding schemes, including the theorem proving theres no universal data compressor. I transmitted one bit per comparison the algorithm does, so, on average, the number of comparisons must be at least the number of bits needed to represent my data, according to information theory. More technically, [the average number of comparisons must be at least the Shannon entropy of my input data, measured in bits][2]. Entropy is a mathematical measure of the information content, or unpredictability, of something.
If I have an array of (N) elements that could be in any possible order without bias, then entropy is maximised and is (\log_{2}{N!}) bits. That proves that (O(N\log N)) is an optimal average for a comparison-based sort with arbitrary input.
Thats the theory, but how do real sorting algorithms compare? Below is a plot of the average number of comparisons needed to sort an array. Ive compared the theoretical optimum against naïve quicksort and the [Ford-Johnson merge-insertion sort][3], which was designed to minimise comparisons (though its rarely faster than quicksort overall because theres more to life than minimising comparisons). Since it was developed in 1959, merge-insertion sort has been tweaked to squeeze a few more comparisons out, but the plot shows its already almost optimal.
![Plot of average number of comparisons needed to sort randomly shuffled arrays of length up to 100. Bottom line is theoretical optimum. Within about 1% is merge-insertion sort. Naïve quicksort is within about 25% of optimum.][4]
Its nice when a little theory gives such a tight practical result.
### Summary so far
Heres whats been proven so far:
1. If the array could start in any order, at least (O(N\log N)) comparisons are needed in the worst case
2. The average number of comparisons must be at least the entropy of the array, which is (O(N\log N)) for random input
Note that #2 allows comparison-based sorting algorithms to be faster than (O(N\log N)) if the input is low entropy (in other words, more predictable). Merge sort is close to (O(N)) if the input contains many sorted subarrays. Insertion sort is close to (O(N)) if the input is an array that was sorted before being perturbed a bit. None of them beat (O(N\log N)) in the worst case unless some array orderings are impossible as inputs.
## General sorting algorithms
Comparison-based sorts are an interesting special case in practice, but theres nothing theoretically special about [`CMP`][5] as opposed to any other instruction on a computer. Both arguments above can be generalised to any sorting algorithm if you note a couple of things:
1. Most computer instructions have more than two possible outputs, but still have a limited number
2. The limited number of outputs means that one instruction can only process a limited amount of entropy
That gives us the same (O(N\log N)) lower bound on the number of instructions. Any physically realisable computer can only process a limited number of instructions at a time, so thats an (O(N\log N)) lower bound on the time required, as well.
### But what about “faster” algorithms?
The most useful practical implication of the general (O(N\log N)) bound is that if you hear about any asymptotically faster algorithm, you know it must be “cheating” somehow. There must be some catch that means it isnt a general purpose sorting algorithm that scales to arbitrarily large arrays. It might still be a useful algorithm, but its a good idea to read the fine print closely.
A well-known example is radix sort. Its often called an (O(N)) sorting algorithm, but the catch is that it only works if all the numbers fit into (k) bits, and its really (O({kN})).
What does that mean in practice? Suppose you have an 8-bit machine. You can represent (2^{8} = 256) different numbers in 8 bits, so if you have an array of thousands of numbers, youre going to have duplicates. That might be okay for some applications, but for others you need to upgrade to at least 16 bits, which can represent (2^{16} = 65,536) numbers distinctly. 32 bits will support (2^{32} = 4,294,967,296) different numbers. As the size of the array goes up, the number of bits needed will tend to go up, too. To represent (N) different numbers distinctly, youll need (k \geq \log_{2}N). So, unless youre okay with lots of duplicates in your array, (O({kN})) is effectively (O(N\log N)).
The need for (O(N\log N)) of input data in the general case actually proves the overall result by itself. That argument isnt so interesting in practice because we rarely need to sort billions of integers on a 32-bit machine, and [if anyones hit the limits of a 64-bit machine, they havent told the rest of us][6].
--------------------------------------------------------------------------------
via: https://theartofmachinery.com/2019/01/05/sorting_is_nlogn.html
作者:[Simon Arneaud][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://theartofmachinery.com
[b]: https://github.com/lujun9972
[1]: http://hyperphysics.phy-astr.gsu.edu/hbase/Math/stirling.html
[2]: https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
[3]: https://en.wikipedia.org/wiki/Merge-insertion_sort
[4]: /images/sorting_is_nlogn/sorting_algorithms_num_comparisons.svg
[5]: https://c9x.me/x86/html/file_module_x86_id_35.html
[6]: https://sortbenchmark.org/

View File

@ -0,0 +1,115 @@
[#]: collector: (lujun9972)
[#]: translator: (silentdawn-zz)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Why Sorting is O(N log N))
[#]: via: (https://theartofmachinery.com/2019/01/05/sorting_is_nlogn.html)
[#]: author: (Simon Arneaud https://theartofmachinery.com)
为什么排序的复杂度为 O(N log N)
======
基本上所有正而八经的算法教材都会解释什么是快速排序算法,比如 quicksort 和堆排序,但又都基本上不会动用复杂的数学来证明这些算法只能趋近你所能达到的速度。
### 关于标记的说明
大多数计算机专业的科学家使用大写字母 O 标记来指代“趋近与乘以一个常数比例因子”,这与数学专业所指代的意义是有所区别的。这里我使用的大 O 标记的含义与计算机教材所指相同,且不会混杂使用数学专业所指含义。
## 基于比较的排序
先来看个特例即每次比较两个值大小的算法quicksort、堆排序及其它通用排序算法基本上都是这样的。这种思想后续可以扩展至所有排序算法。
### A simple counting argument for the worst case
假设有 4 个互不相等的数,且顺序随机,那么,可以通过比较一对数字完成排序吗?显然不能,证明如下:根据定义,对该数组排序,需要按照某种顺序重新排列数字。那么究竟有多少种可能的排列呢?第一个数字可以放在四个位置中的任意一个,第二个数字可以放在剩下三个位置中的任意一个,第三个数字可以放在剩下两个位置中的任意一个,最后一个数字只有剩下的一个位置可选。这样,共有 4×3×2×1 = 4! = 24 种排列可供选择。通过一次比较大小,只能产生两种可能的结果。如果列出所有的排列,那么“从小到大”排序对应的可能是第 8 种排列,按“从大到小”排序对应的可能是第 22 种排列,但无法知道什么时候需要的是其它 22 种排列。
通过 2 次比较,可以得到 2×2=4 种可能的结果,这仍然不够。只要比较的次数少于 5对应 (2^{5} = 32) 种输出),就无法完成 4 个随机次序的数字的排序。如果 (W(N)) 是最差情况下对 (N) 个不同元素进行排序所需要的比较次数,那么
[2^{W(N)} \geq N!]
两边取以 2 为底的对数,得
[W(N) \geq \log_{2}{N!}]
(N!) 的增长近似于 (N^{N}) (参阅 [Stirling 公式][1]),那么
[W(N) \succeq \log N^{N} = N\log N]
这就是最差情况下从输出计数的角度得出的 (O(N\log N)) 上限。
### 信息论角度平均状态的例子
使用一些信息论知识,就可以从上面的讨论中得到一个更有力的结论。下面,使用排序算法作为信息传输的编码器:
1. 任取一个数,比如 15
2. 从 4 个数字的排列列表中查找第 15 种排列
3. 对这种排列运行排序算法,记录所有的“大”、“小”比较结果
4. 用二进制编码发送比较结果
5. 接收端重新逐步执行发送端的排序算法,需要的话可以引用发送端的比较结果
6. 现在接收端就可以知道发送端如何重新排列数字以按照需要排序,接收端可以对排列进行逆算,得到 4 个数字的初始顺序
7. 接收端在排列表中检索发送端的原始排列,指出发送端发送的是 15
确实,这有点奇怪,但确实可以。这意味着排序算法遵循着与编码方案相同的定律,包括理论所证明的通用数据压缩算法的不存在。算法中每次比较发送 1 bit 的比较结果编码数据,根据信息论,比较的次数至少是能表示所有数据的二进制位数。更技术语言点,[平均所需的最小比较次数是输入数据的香农熵以二进制的位数][2]。熵是信息等不可预测量的数学度量。
包含 (N) 个元素的数组,元素次序随机且无偏时的熵最大,其值为 (\log_{2}{N!}) 二进制位。这证明 (O(N\log N)) 是基于比较的排序对任意输入所需的比较次数。
以上都是理论说法,那么实际的排序算法如何做比较的呢?下面是一个数组排序所需比较次数均值的图。我比较的是理论值与 quicksort 及 [Ford-Johnson 合并插入排序][3] 的表现。后者设计目的就是最小化比较次数(整体上没比 quicksort 快多少,因为生命中相对于最小化比较,还有更多其它的事情)。又因为合并插入排序是在 1959 年提出的,它又减少了一些比较次数,但图示说明,它基本上达到了最优状态。
![随机排列 100 个元素所需的平均排序次数图。最下面的线是理论值,约 1% 处的是合并插入算法,原始 quicksort 大约在 25% 处。][4]
一点点理论导出这么实用的结论,这感觉真棒!
### 小结
证明了:
1. 如果数组可以是任意顺序,在最坏情况下至少需要 (O(N\log N)) 次比较。
2. 数组的平均比较次数最少是数组的熵,对随机输入而言,其值是 (O(N\log N)) 。
注意,第 2 个结论允许基于比较的算法优于 (O(N\log N)),前提是输入是低熵的(换言之,是部分可预测的)。如果输入包含很多有序的子序列,那么合并排序的性能接近 (O(N))。如果在确定一个位之前,其输入是有序的,插入排序性能接近 (O(N))。在最差情况下,以上算法的性能表现都不超出 (O(N\log N))。
## 一般排序算法
基于比较的排序在实践中是个有趣的特例,但计算机的 [`CMP`][5] 指令与其它指令相比,并没有任何理论上的区别。在下面两条的基础上,前面两种情形都可以扩展至任意排序算法:
1. 大多数计算机指令有多于两个的输出,但输出的数量仍然是有限的。
2. 一条指令有限的输出意味着一条指令只能处理有限的熵。
这给出了 (O(N\log N)) 对应的指令下限。任何物理可实现的计算机都只能在给定时间内执行有限数量的指令,所以算法的执行时间也有对应 (O(N\log N)) 的下限。
### 什么是更快的算法?
一般意义上的 (O(N\log N)) 下限,放在实践中来看,如果听人说到任何更快的算法,你要知道,它肯定以某种方式“作弊”了,其中肯定有圈套,即它不是一个可以处理任意大数组的通用排序算法。可能它是一个有用的算法,但最好看明白它字里行间隐含的东西。
一个广为人知的例子是基数排序算法 radix sort它经常被称为 (O(N)) 排序算法,但它只能处理所有数字都是 (k) 位的情况,所以实际上它的性能是 (O({kN}))。
什么意思呢?假如你用的 8 位计算机,那么 8 个二进制位可以表示 (2^{8} = 256) 个不同的数字,如果数组有上千个数字,那么其中必有重复。对有些应用而言这是可以的,但对有些应用就必须用 16 个二进制位来表示16 个二进制位可以表示 (2^{16} = 65,536) 个不同的数字。32 个二进制位可以表示 (2^{32} = 4,294,967,296) 不同的数字。随着数组长度的增长,所需要的二进制位数也在增长。要表示 (N) 个不同的数字,需要 (k \geq \log_{2}N) 个二进制位。所以,只有允许数组中存在重复的数字时,(O({kN})) 才优于 (O(N\log N))。
一般意义上输入数据的 (O(N\log N)) 的性能已经说明了全部问题。这个讨论不那么有趣因为很少需要在 32 位计算机上对几十亿整数进行排序,[如果有谁的需求超出了 64 位计算机的极限,他一定没有说出他的全部][6]。
--------------------------------------------------------------------------------
via: https://theartofmachinery.com/2019/01/05/sorting_is_nlogn.html
作者:[Simon Arneaud][a]
选题:[lujun9972][b]
译者:[silentdawn-zz](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://theartofmachinery.com
[b]: https://github.com/lujun9972
[1]: http://hyperphysics.phy-astr.gsu.edu/hbase/Math/stirling.html
[2]: https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
[3]: https://en.wikipedia.org/wiki/Merge-insertion_sort
[4]: /images/sorting_is_nlogn/sorting_algorithms_num_comparisons.svg
[5]: https://c9x.me/x86/html/file_module_x86_id_35.html
[6]: https://sortbenchmark.org/