mirror of
https://github.com/1c7/Crash-Course-Computer-Science-Chinese.git
synced 2025-01-20 20:40:11 +08:00
595 lines
24 KiB
Plaintext
595 lines
24 KiB
Plaintext
|
This episode is brought to you by Curiosity Stream.
|
|||
|
本集由 Curiosity Stream 赞助播出
|
|||
|
|
|||
|
Hi, I'm Carrie Anne, and welcome to Crash Course Computer Science!
|
|||
|
(。・∀・)ノ゙嗨,我是 Carrie Anne,欢迎收看计算机科学速成课!
|
|||
|
|
|||
|
Last episode we talked about Files, bundles of data, stored on a computer, that
|
|||
|
上集我们讨论了文件格式,如何编码文字,声音,图片
|
|||
|
|
|||
|
are formatted and arranged to encode information, like text, sound or images.
|
|||
|
上集我们讨论了文件格式,如何编码文字,声音,图片
|
|||
|
|
|||
|
We even discussed some basic file formats, like text, wave, and bitmap.
|
|||
|
还举了具体例子 .txt .wav .bmp
|
|||
|
|
|||
|
While these formats are perfectly fine and still used today,
|
|||
|
这些格式虽然管用,而且现在还在用 \N 但它们的简单性意味着效率不高
|
|||
|
|
|||
|
their simplicity also means they're not very efficient.
|
|||
|
这些格式虽然管用,而且现在还在用 \N 但它们的简单性意味着效率不高
|
|||
|
|
|||
|
Ideally, we want files to be as small as possible, so we can store lots of them without filling
|
|||
|
我们希望文件能小一点,这样能存大量文件,传输也会快一些
|
|||
|
|
|||
|
up our hard drives, and also transmit them more quickly.
|
|||
|
我们希望文件能小一点,这样能存大量文件,传输也会快一些
|
|||
|
|
|||
|
Nothing is more frustrating than waiting for an email attachment to download. Ugh!
|
|||
|
等邮件附件下载烦死人了
|
|||
|
|
|||
|
The answer is compression, which literally squeezes data into a smaller size.
|
|||
|
解决方法是 压缩,把数据占用的空间压得更小
|
|||
|
|
|||
|
To do this, we have to encode data using fewer bits than the original representation.
|
|||
|
用更少的位(bit)来表示数据
|
|||
|
|
|||
|
That might sound like magic, but it's actually computer science!
|
|||
|
听起来像魔法,但其实是计算机科学!
|
|||
|
|
|||
|
Lets return to our old friend from last episode, Mr. Pac-man!
|
|||
|
我们继续用上集的 吃豆人例子,图像是 4像素x4像素
|
|||
|
|
|||
|
This image is 4 pixels by 4 pixels.
|
|||
|
我们继续用上集的 吃豆人例子,图像是 4像素x4像素
|
|||
|
|
|||
|
As we discussed, image data is typically stored as a list of pixel values.
|
|||
|
之前说过,图像一般存成一长串像素值
|
|||
|
|
|||
|
To know where rows end, image files have metadata, which defines properties like dimensions.
|
|||
|
为了知道一行在哪里结束 \N 图像要有元数据,写明尺寸等属性
|
|||
|
|
|||
|
But, to keep it simple today, we're not going to worry about it.
|
|||
|
但为了简单起见,今天忽略这些细节
|
|||
|
|
|||
|
If you mix full intensity red, green and blue - that's 255 for all
|
|||
|
如果红绿蓝都是 255 会得到白色
|
|||
|
|
|||
|
three values - you get the color white.
|
|||
|
如果红绿蓝都是 255 会得到白色
|
|||
|
|
|||
|
If you mix full intensity red and green, but no blue (it's 0), you get yellow.
|
|||
|
如果混合 255红色和255绿色,会得到黄色
|
|||
|
|
|||
|
We have 16 pixels in our image, and each of those needs 3 bytes of color data.
|
|||
|
这个图像有16个像素(4x4), 每个像素3个字节
|
|||
|
|
|||
|
That means this image's data will consume 48 bytes of storage.
|
|||
|
总共占48个字节(16x3=48)
|
|||
|
|
|||
|
But, we can compress the data and pack it into a smaller number of bytes than 48!
|
|||
|
但我们可以压缩到少于 48 个字节
|
|||
|
|
|||
|
One way to compress data is to reduce repeated or redundant information.
|
|||
|
一种方法是 减少重复信息
|
|||
|
|
|||
|
The most straightforward way to do this is called Run-Length Encoding.
|
|||
|
最简单的方法叫 游程编码(Run-Length Encoding)
|
|||
|
|
|||
|
This takes advantage of the fact that there are often runs of identical values in files.
|
|||
|
适合经常出现相同值的文件
|
|||
|
|
|||
|
For example, in our pac-man image, there are 7 yellow pixels in a row.
|
|||
|
比如吃豆人 有7个连续黄色像素
|
|||
|
|
|||
|
Instead of encoding redundant data: yellow pixel, yellow pixel, yellow pixel, and so
|
|||
|
与其全存下来:黄色,黄色,黄色...
|
|||
|
|
|||
|
on, we can just say "there's 7 yellow pixels in a row" by inserting
|
|||
|
可以插入一个额外字节,代表有7个连续黄色像素
|
|||
|
|
|||
|
an extra byte that specifies the length of the run, like so:
|
|||
|
可以插入一个额外字节,代表有7个连续黄色像素
|
|||
|
|
|||
|
And then we can eliminate the redundant data behind it.
|
|||
|
然后删掉后面的重复数据.
|
|||
|
|
|||
|
To ensure that computers don't get confused with which bytes are run lengths and which
|
|||
|
为了让计算机能分辨哪些字节是"长度" 哪些字节是"颜色" \N 格式要一致
|
|||
|
|
|||
|
bytes represent color, we have to be consistent in how we apply this scheme.
|
|||
|
为了让计算机能分辨哪些字节是"长度" 哪些字节是"颜色" \N 格式要一致
|
|||
|
|
|||
|
So, we need to preface all pixels with their run-length.
|
|||
|
所以我们要给所有像素前面标上长度
|
|||
|
|
|||
|
In some cases, this actually adds data, but on the whole, we've dramatically reduced
|
|||
|
有时候数据反而会变多,但就这个例子而言
|
|||
|
|
|||
|
the number of bytes we need to encode this image.
|
|||
|
我们大大减少了字节数,之前是48 现在是24
|
|||
|
|
|||
|
We're now at 24 bytes, down from 48.
|
|||
|
我们大大减少了字节数,之前是48 现在是24
|
|||
|
|
|||
|
That's 50% smaller!
|
|||
|
小了50%!省了很多空间!
|
|||
|
|
|||
|
A huge saving!
|
|||
|
小了50%!省了很多空间!
|
|||
|
|
|||
|
Also note that we haven't lost any data.
|
|||
|
还有,我们没有损失任何数据 \N 我们可以轻易恢复到原来的数据
|
|||
|
|
|||
|
We can easily expand this back to the original form without any degradation.
|
|||
|
还有,我们没有损失任何数据 \N 我们可以轻易恢复到原来的数据
|
|||
|
|
|||
|
A compression technique that has this characteristic is called
|
|||
|
这叫"无损压缩",没有丢失任何数据
|
|||
|
|
|||
|
lossless compression, because we don't lose anything.
|
|||
|
这叫"无损压缩",没有丢失任何数据
|
|||
|
|
|||
|
The decompressed data is identical to the original before compression, bit for bit.
|
|||
|
解压缩后,数据和压缩前完全一样
|
|||
|
|
|||
|
Let's take a look at another type of lossless compression, where
|
|||
|
我们来看另一种无损压缩,它用更紧凑的方式表示数据块
|
|||
|
|
|||
|
blocks of data are replaced by more compact representations.
|
|||
|
我们来看另一种无损压缩,它用更紧凑的方式表示数据块
|
|||
|
|
|||
|
This is sort of like " don't forget to be awesome " being replaced by DFTBA.
|
|||
|
有点像 "别忘了变厉害" 简写成 DFTBA
|
|||
|
|
|||
|
To do this, we need a dictionary that stores the mapping from codes to data.
|
|||
|
为此,我们需要一个字典,存储"代码"和"数据"间的对应关系
|
|||
|
|
|||
|
Lets see how this works for our example.
|
|||
|
我们看个例子
|
|||
|
|
|||
|
We can view our image as not just a string of individual pixels,
|
|||
|
我们可以把图像看成一块块,而不是一个个像素
|
|||
|
|
|||
|
but as little blocks of data.
|
|||
|
我们可以把图像看成一块块,而不是一个个像素
|
|||
|
|
|||
|
For simplicity, we're going to use pixel pairs, which are 6 bytes long,
|
|||
|
为了简单,我们把2个像素当成1块(占6个字节)
|
|||
|
|
|||
|
but blocks can be any size.
|
|||
|
但你也可以定成其他大小
|
|||
|
|
|||
|
In our example, there are only four pairings: White-yellow, black-yellow,
|
|||
|
我们只有四对: 白黄 黑黄 黄黄 白白
|
|||
|
|
|||
|
yellow-yellow and white-white.
|
|||
|
我们只有四对: 白黄 黑黄 黄黄 白白
|
|||
|
|
|||
|
Those are the data blocks in our dictionary we want to generate compact codes for.
|
|||
|
我们会为这四对 生成紧凑代码(compact codes)
|
|||
|
|
|||
|
What's interesting, is that these blocks occur at different frequencies.
|
|||
|
有趣的是,这些块的出现频率不同
|
|||
|
|
|||
|
One method for generating efficient codes is building a Huffman Tree, invented by David
|
|||
|
1950年代 大卫·霍夫曼 发明了一种高效编码方式叫 \N "霍夫曼树"(Huffman Tree) 当时他是麻省理工学院的学生
|
|||
|
|
|||
|
Huffman while he was a student at MIT in the 1950s.
|
|||
|
1950年代 大卫·霍夫曼 发明了一种高效编码方式叫 \N "霍夫曼树"(Huffman Tree) 当时他是麻省理工学院的学生
|
|||
|
|
|||
|
His algorithm goes like this.
|
|||
|
算法是这样的
|
|||
|
|
|||
|
First, you layout all the possible blocks and their frequencies.
|
|||
|
首先,列出所有块和出现频率,每轮选两个最低的频率
|
|||
|
|
|||
|
At every round, you select the two with the lowest frequencies.
|
|||
|
首先,列出所有块和出现频率,每轮选两个最低的频率
|
|||
|
|
|||
|
Here, that's Black-Yellow and White-White, each with a frequency of 1.
|
|||
|
这里 黑黄 和 白白 的频率最低,它们都是 1
|
|||
|
|
|||
|
You combine these into a little tree. which have a combined frequency of 2,
|
|||
|
可以把它们组成一个树,总频率 2
|
|||
|
|
|||
|
so we record that.
|
|||
|
可以把它们组成一个树,总频率 2
|
|||
|
|
|||
|
And now one step of the algorithm done.
|
|||
|
现在完成了一轮算法
|
|||
|
|
|||
|
Now we repeat the process.
|
|||
|
现在我们重复这样做
|
|||
|
|
|||
|
This time we have three things to choose from.
|
|||
|
这次有3个可选
|
|||
|
|
|||
|
Just like before, we select the two with the lowest frequency, put them into a little tree,
|
|||
|
就像上次一样,选频率最低的两个,放在一起,并记录总频率
|
|||
|
|
|||
|
and record the new total frequency of all the sub items.
|
|||
|
就像上次一样,选频率最低的两个,放在一起,并记录总频率
|
|||
|
|
|||
|
Ok, we're almost done.
|
|||
|
好,我们快完成了
|
|||
|
|
|||
|
This time it's easy to select the two items with the lowest frequency
|
|||
|
这次很简单,因为只有2个选择
|
|||
|
|
|||
|
because there are only two things left to pick.
|
|||
|
这次很简单,因为只有2个选择
|
|||
|
|
|||
|
We combine these into a tree, and now we're done!
|
|||
|
把它们组合成一棵树就完成了!
|
|||
|
|
|||
|
Our tree looks like this, and it has a very cool property: it's arranged by frequency,
|
|||
|
现在看起来像这样,它有一个很酷的属性:按频率排列
|
|||
|
|
|||
|
with less common items lower down.
|
|||
|
频率低的在下面
|
|||
|
|
|||
|
So, now we have a tree, but you may be wondering how this gets us to a dictionary.
|
|||
|
现在有了一棵树,你可能在想 "怎么把树变成字典?"
|
|||
|
|
|||
|
Well, we use our frequency-sorted tree to generate the codes we need
|
|||
|
我们可以把每个分支用 0 和 1 标注,就像这样
|
|||
|
|
|||
|
by labeling each branch with a 0 or a 1, like so.
|
|||
|
我们可以把每个分支用 0 和 1 标注,就像这样
|
|||
|
|
|||
|
With this, we can write out our code dictionary.
|
|||
|
现在可以生成字典
|
|||
|
|
|||
|
Yellow-yellow is encoded as just a single 0. White-yellow is encoded as 10
|
|||
|
黄黄 编码成 0 \N 白黄 编码成 10 \N 黑黄 编码成 110 \N 白白 编码成 111
|
|||
|
|
|||
|
Black-Yellow is 1 1 0. and finally white-white is 1 1 1.
|
|||
|
黄黄 编码成 0 \N 白黄 编码成 10 \N 黑黄 编码成 110 \N 白白 编码成 111
|
|||
|
|
|||
|
The really cool thing about these codewords is that there's no way to
|
|||
|
酷的地方是 它们绝对不会冲突
|
|||
|
|
|||
|
have conflicting codes, because each path down the tree is unique.
|
|||
|
因为树的每条路径是唯一的
|
|||
|
|
|||
|
This means our codes are prefix-free, that is no code starts with another complete code.
|
|||
|
意味着代码是"无前缀"的,没有代码是以另一个代码开头的
|
|||
|
|
|||
|
Now, let's return to our image data and compress it!
|
|||
|
现在我们来压缩!
|
|||
|
|
|||
|
-- NOT BYTES -- BITS!! That's less than 2 bytes of data!
|
|||
|
注意是位(bit)! 不是字节(byte)! \N 14位(bit) 还不到2个字节(byte)!
|
|||
|
|
|||
|
But, don't break out the champagne quite yet!
|
|||
|
但,先别急着开香槟!
|
|||
|
|
|||
|
This data is meaningless unless we also save our code dictionary.
|
|||
|
字典也要保存下来,否则 14 bit 毫无意义
|
|||
|
|
|||
|
So, we'll need to append it to the front of the image data, like this.
|
|||
|
所以我们把字典 加到 14 bit 前面,就像这样
|
|||
|
|
|||
|
Now, including the dictionary, our image data is 30 bytes long.
|
|||
|
现在加上字典,图像是 30 个字节(bytes) \N 比 48 字节好很多
|
|||
|
|
|||
|
That's still a significant improvement over 48 bytes.
|
|||
|
现在加上字典,图像是 30 个字节(bytes) \N 比 48 字节好很多
|
|||
|
|
|||
|
The two approaches we discussed,
|
|||
|
"消除冗余"和"用更紧凑的表示方法",这两种方法通常会组合使用
|
|||
|
|
|||
|
removing redundancies and using more compact representations, are often combined,
|
|||
|
"消除冗余"和"用更紧凑的表示方法",这两种方法通常会组合使用
|
|||
|
|
|||
|
and underlie almost all lossless compressed file formats,
|
|||
|
几乎所有无损压缩格式都用了它们 \N 比如 GIF, PNG, PDF, ZIP
|
|||
|
|
|||
|
like GIF, PNG, PDF and ZIP files.
|
|||
|
几乎所有无损压缩格式都用了它们 \N 比如 GIF, PNG, PDF, ZIP
|
|||
|
|
|||
|
Both run-length encoding and dictionary coders are lossless compression techniques.
|
|||
|
游程编码 和 字典编码 都是无损压缩
|
|||
|
|
|||
|
No information is lost; when you decompress, you get the original file.
|
|||
|
压缩时不会丢失信息,解压后,数据和之前完全一样
|
|||
|
|
|||
|
That's really important for many types of files.
|
|||
|
无损对很多文件很重要
|
|||
|
|
|||
|
Like, it'd be very odd if I zipped up a word document to send to you,
|
|||
|
比如我给你发了个压缩的 word 文档\N 你解压之后发现内容变了,这就很糟糕了
|
|||
|
|
|||
|
and when you decompressed it on your computer, the text was different.
|
|||
|
比如我给你发了个压缩的 word 文档\N 你解压之后发现内容变了,这就很糟糕了
|
|||
|
|
|||
|
But, there are other types of files where we can get away with little changes, perhaps
|
|||
|
但其他一些文件,丢掉一些数据没什么关系
|
|||
|
|
|||
|
by removing unnecessary or less important information, especially information
|
|||
|
丢掉那些人类看不出区别的数据
|
|||
|
|
|||
|
that human perception is not good at detecting.
|
|||
|
丢掉那些人类看不出区别的数据
|
|||
|
|
|||
|
And this trick underlies most lossy compression techniques.
|
|||
|
大多数有损压缩技术,都用到了这点
|
|||
|
|
|||
|
These tend to be pretty complicated, so we're going to attack this at a conceptual level.
|
|||
|
实际细节比较复杂,所以我们讲概念就好
|
|||
|
|
|||
|
Let's take sound as an example.
|
|||
|
以声音为例,你的听力不是完美的
|
|||
|
|
|||
|
Your hearing is not perfect.
|
|||
|
以声音为例,你的听力不是完美的
|
|||
|
|
|||
|
We can hear some frequencies of sound better than others.
|
|||
|
有些频率我们很擅长,其他一些我们根本听不见,比如超声波
|
|||
|
|
|||
|
And there are some we can't hear at all, like ultrasound.
|
|||
|
有些频率我们很擅长,其他一些我们根本听不见,比如超声波
|
|||
|
|
|||
|
Unless you're a bat.
|
|||
|
除非你是蝙蝠
|
|||
|
|
|||
|
Basically, if we make a recording of music, and there's data in the ultrasonic frequency range,
|
|||
|
举个例子,如果录音乐,超声波数据都可以扔掉 \N 因为人类听不到超声波
|
|||
|
|
|||
|
we can discard it, because we know that humans can't hear it.
|
|||
|
举个例子,如果录音乐,超声波数据都可以扔掉 \N 因为人类听不到超声波
|
|||
|
|
|||
|
On the other hand, humans are very sensitive to frequencies in the vocal range, like people
|
|||
|
另一方面,人类对人声很敏感,所以应该尽可能保持原样
|
|||
|
|
|||
|
singing, so it's best to preserve quality there as much as possible.
|
|||
|
另一方面,人类对人声很敏感,所以应该尽可能保持原样
|
|||
|
|
|||
|
Deep bass is somewhere in between.
|
|||
|
低音介于两者之间,人类听得到,但不怎么敏感
|
|||
|
|
|||
|
Humans can hear it, but we're less attuned to it.
|
|||
|
低音介于两者之间,人类听得到,但不怎么敏感
|
|||
|
|
|||
|
We mostly sense it.
|
|||
|
一般是感觉到震动
|
|||
|
|
|||
|
Lossy audio compressors takes advantage of this, and encode different
|
|||
|
有损音频压缩利用这一点,用不同精度编码不同频段
|
|||
|
|
|||
|
frequency bands at different precisions.
|
|||
|
有损音频压缩利用这一点,用不同精度编码不同频段
|
|||
|
|
|||
|
Even if the result is rougher, it's likely that users won't perceive the difference.
|
|||
|
听不出什么区别,不会明显影响体验
|
|||
|
|
|||
|
Or at least it doesn't dramatically affect the experience.
|
|||
|
听不出什么区别,不会明显影响体验
|
|||
|
|
|||
|
And here comes the hate mail from the audiophiles!
|
|||
|
音乐发烧友估计要吐槽了!
|
|||
|
|
|||
|
You encounter this type of audio compression all the time.
|
|||
|
日常生活中你会经常碰到这类音频压缩
|
|||
|
|
|||
|
It's one of the reasons you sound different on a cellphone versus in person.
|
|||
|
所以你在电话里的声音 和现实中不一样
|
|||
|
|
|||
|
The audio data is being compressed, allowing more people to take calls at once.
|
|||
|
压缩音频是为了让更多人能同时打电话
|
|||
|
|
|||
|
As the signal quality or bandwidth get worse, compression algorithms remove more data,
|
|||
|
如果网速变慢了,压缩算法会删更多数据
|
|||
|
|
|||
|
further reducing precision, which is why Skype calls sometimes sound like robots talking.
|
|||
|
进一步降低声音质量,所以 Skype 通话有时听起来像机器人
|
|||
|
|
|||
|
Compared to an uncompressed audio format, like a WAV or FLAC (there we go, got the audiophiles back)
|
|||
|
和没压缩的音频格式相比,比如 WAV 或 FLAC \N ( 这下音乐发烧友满意了)
|
|||
|
|
|||
|
compressed audio files, like MP3s, are often 10 times smaller.
|
|||
|
压缩音频文件如 MP3,能小10倍甚至更多.
|
|||
|
|
|||
|
That's a huge saving!
|
|||
|
省了超多空间!
|
|||
|
|
|||
|
And it's why I've got a killer music collection on my retro iPod.
|
|||
|
所以我的旧 iPod 上有一堆超棒的歌
|
|||
|
|
|||
|
Don't judge.
|
|||
|
别批判我
|
|||
|
|
|||
|
This idea of discarding or reducing precision in a manner that aligns with human perception
|
|||
|
这种删掉人类无法感知的数据的方法,叫"感知编码"
|
|||
|
|
|||
|
is called perceptual coding,
|
|||
|
这种删掉人类无法感知的数据的方法,叫"感知编码"
|
|||
|
|
|||
|
and it relies on models of human perception,
|
|||
|
它依赖于人类的感知模型,模型来自"心理物理学"领域
|
|||
|
|
|||
|
which come from a field of study called Psychophysics.
|
|||
|
它依赖于人类的感知模型,模型来自"心理物理学"领域
|
|||
|
|
|||
|
This same idea is the basis of lossy compressed image formats, most famously JPEGs.
|
|||
|
这是各种"有损压缩图像格式"的基础,最著名的是 JPEG
|
|||
|
|
|||
|
Like hearing, the human visual system is imperfect.
|
|||
|
就像听力一样,人的视觉系统也不是完美的.
|
|||
|
|
|||
|
We're really good at detecting sharp contrasts, like the edges of objects,
|
|||
|
我们善于看到尖锐对比,比如物体的边缘
|
|||
|
|
|||
|
but our perceptual system isn't so hot with subtle color variations.
|
|||
|
但我们看不出颜色的细微变化
|
|||
|
|
|||
|
JPEG takes advantage of this by breaking images up into blocks of 8x8 pixels,
|
|||
|
JPEG 利用了这一点,把图像分解成 8x8 像素块
|
|||
|
|
|||
|
then throwing away a lot of the high-frequency spatial data.
|
|||
|
然后删掉大量高频率空间数据
|
|||
|
|
|||
|
For example, take this photo of our directors dog - Noodle.
|
|||
|
举个例子,这是导演的狗,面面
|
|||
|
|
|||
|
So cute!
|
|||
|
超可爱!
|
|||
|
|
|||
|
Let's look at a patch of 8x8 pixels.
|
|||
|
我们来看其中一个 8x8 像素
|
|||
|
|
|||
|
Pretty much every pixel is different from its neighbor,
|
|||
|
几乎每个像素都和相邻像素不同,用无损技术很难压缩 \N 因为太多不同点了
|
|||
|
|
|||
|
making it hard to compress with loss-less techniques because there's just a lot going on.
|
|||
|
几乎每个像素都和相邻像素不同,用无损技术很难压缩 \N 因为太多不同点了
|
|||
|
|
|||
|
Lots of little details.
|
|||
|
很多小细节
|
|||
|
|
|||
|
But human perception doesn't register all those details.
|
|||
|
但人眼看不出这些细节
|
|||
|
|
|||
|
So, we can discard a lot of that detail, and replace it with a simplified patch like this.
|
|||
|
因此可以删掉很多,用这样一个简单的块来代替
|
|||
|
|
|||
|
This maintains the visual essence, but might only use 10% of the data.
|
|||
|
这看起来一样,但可能只占10%的原始数据
|
|||
|
|
|||
|
We can do this for all the patches in the image and get this result.
|
|||
|
我们可以对所有 8x8 块做一样的操作
|
|||
|
|
|||
|
You can still see it's a dog, but the image is rougher.
|
|||
|
图片依然可以认出是一只狗,只是更粗糙一些
|
|||
|
|
|||
|
So, that's an extreme example, going from a slightly compressed JPEG to a highly compressed one,
|
|||
|
以上例子比较极端,进行了高度压缩,只有原始大小的八分之一
|
|||
|
|
|||
|
one-eighth the original file size.
|
|||
|
以上例子比较极端,进行了高度压缩,只有原始大小的八分之一
|
|||
|
|
|||
|
Often, you can get away with a quality somewhere in between, and perceptually,
|
|||
|
通常你可以取得平衡,图片看起来差不多,但文件小不少
|
|||
|
|
|||
|
it's basically the same as the original.
|
|||
|
通常你可以取得平衡,图片看起来差不多,但文件小不少
|
|||
|
|
|||
|
Can you tell the difference between the two?
|
|||
|
你看得出两张图的区别吗?
|
|||
|
|
|||
|
Probably not, but I should mention that video compression plays a role in that too,
|
|||
|
估计看不出,但我想提一下,视频压缩也造成了影响
|
|||
|
|
|||
|
since I'm literally being compressed in a video right now.
|
|||
|
毕竟你现在在看视频啊
|
|||
|
|
|||
|
Videos are really just long sequences of images, so a lot of what I said
|
|||
|
视频只是一长串连续图片 \N 所以图片的很多方面也适用于视频
|
|||
|
|
|||
|
about them applies here too.
|
|||
|
视频只是一长串连续图片 \N 所以图片的很多方面也适用于视频
|
|||
|
|
|||
|
But videos can do some extra clever stuff, because between frames,
|
|||
|
但视频可以做一些小技巧 \N 因为帧和帧之间很多像素一样
|
|||
|
|
|||
|
a lot of pixels are going to be the same.
|
|||
|
但视频可以做一些小技巧 \N 因为帧和帧之间很多像素一样
|
|||
|
|
|||
|
Like this whole background behind me!
|
|||
|
比如我后面的背景!
|
|||
|
|
|||
|
This is called temporal redundancy.
|
|||
|
这叫 时间冗余
|
|||
|
|
|||
|
We don't need to re-transmit those pixels every frame of the video.
|
|||
|
视频里不用每一帧都存这些像素 \N 可以只存变了的部分
|
|||
|
|
|||
|
We can just copy patches of data forward.
|
|||
|
视频里不用每一帧都存这些像素 \N 可以只存变了的部分
|
|||
|
|
|||
|
When there are small pixel differences, like the readout on this frequency generator behind me,
|
|||
|
当帧和帧之间有小小的差异时,比如后面这个频率发生器
|
|||
|
|
|||
|
most video formats send data that encodes just the difference between patches,
|
|||
|
很多视频编码格式,只存变化的部分
|
|||
|
|
|||
|
which is more efficient than re-transmitting all the pixels afresh, again taking advantage
|
|||
|
这比存所有像素更有效率 \N 利用了帧和帧之间的相似性
|
|||
|
|
|||
|
of inter-frame similarity.
|
|||
|
这比存所有像素更有效率 \N 利用了帧和帧之间的相似性
|
|||
|
|
|||
|
The fanciest video compression formats go one step further.
|
|||
|
更高级的视频压缩格式 会更进一步
|
|||
|
|
|||
|
They find patches that are similar between frames, and not only copy them forward, with
|
|||
|
找出帧和帧之间相似的补丁 \N 然后用简单效果实现,比如移动和旋转
|
|||
|
|
|||
|
or without differences, but also can apply simple effects to them, like a shift or rotation.
|
|||
|
找出帧和帧之间相似的补丁 \N 然后用简单效果实现,比如移动和旋转
|
|||
|
|
|||
|
They can also lighten or darken a patch between frames.
|
|||
|
变亮和变暗
|
|||
|
|
|||
|
So, if I move my hand side to side like this the video compressor will identify the similarity,
|
|||
|
如果我这样摆手,视频压缩器会识别到相似性
|
|||
|
|
|||
|
capture my hand in one or more patches, then just move these patches around between frames.
|
|||
|
用一个或多个补丁代表我的手,然后帧之间直接移动这些补丁
|
|||
|
|
|||
|
You're actually seeing my hand from the past kinda freaky, but it uses a lot less data.
|
|||
|
所以你看到的是我过去的手(不是实时的)\N 有点可怕 但数据量少得多
|
|||
|
|
|||
|
MPEG-4 videos, a common standard, are often 20 to 200 times
|
|||
|
MPEG-4 是常见标准,可以比原文件小20倍到200倍
|
|||
|
|
|||
|
smaller than the original, uncompressed file.
|
|||
|
MPEG-4 是常见标准,可以比原文件小20倍到200倍
|
|||
|
|
|||
|
However, encoding frames as translations and rotations of patches from previous frames
|
|||
|
但用补丁的移动和旋转 来更新画面
|
|||
|
|
|||
|
can go horribly wrong when you compress too heavily, and there isn't
|
|||
|
当压缩太严重时会出错 \N 没有足够空间更新补丁内的像素
|
|||
|
|
|||
|
enough space to update pixel data inside of the patches.
|
|||
|
当压缩太严重时会出错 \N 没有足够空间更新补丁内的像素
|
|||
|
|
|||
|
The video player will forge ahead, applying the right motions,
|
|||
|
即使补丁是错的,视频播放器也会照样播放
|
|||
|
|
|||
|
even if the patch data is wrong.
|
|||
|
即使补丁是错的,视频播放器也会照样播放
|
|||
|
|
|||
|
And this leads to some hilarious and trippy effects, which I'm sure you've seen.
|
|||
|
导致一些怪异又搞笑的结果,你肯定见过这些.
|
|||
|
|
|||
|
Overall, it's extremely useful to have compression techniques for all the types of data I discussed today.
|
|||
|
总的来说,压缩对大部分文件类型都有用
|
|||
|
|
|||
|
(I guess our imperfect vision and hearing are "useful," too.)
|
|||
|
从这个角度来讲,人类不完美的视觉和听觉 也算有用
|
|||
|
|
|||
|
And it's important to know about compression because it allows users to
|
|||
|
学习压缩非常重要 \N 因为可以高效存储图片,音乐,视频
|
|||
|
|
|||
|
store pictures, music, and videos in efficient ways.
|
|||
|
学习压缩非常重要 \N 因为可以高效存储图片,音乐,视频
|
|||
|
|
|||
|
Without it, streaming your favorite Carpool Karaoke videos on YouTube would be nearly impossible,
|
|||
|
如果没有压缩 \N 在 YouTube 看"明星拼车唱歌"几乎不可能
|
|||
|
|
|||
|
due to bandwidth and the economics of transmitting that volume of data for free.
|
|||
|
因为你的带宽可能不够(会很卡) \N 而且供应商不愿意免费传输那么多数据
|
|||
|
|
|||
|
And now when your Skype calls sound like they're being taken over by demons,
|
|||
|
现在你知道为什么打 Skype 电话 \N 有时像在和恶魔通话
|
|||
|
|
|||
|
you'll know what's really going on.
|
|||
|
现在你知道为什么打 Skype 电话 \N 有时像在和恶魔通话
|
|||
|
|
|||
|
I'll see you next week.
|
|||
|
下周见
|
|||
|
|