【计算机科学速成课】笔记四

文章目录

- 19.内存&存储介质
- - 课程引出——
  - 内存与存储器的区别
  - 纸带存储
  - 磁芯存储
  - 磁带、磁鼓存储
  - 磁盘（硬盘）存储
  - 软盘存储
  - 光盘存储（CD&DVD）
  - 固态硬盘存储
- 20.文件系统
- - 课程引出——
  - 文件格式
  - .txt文本文件
  - .wav 音频文件
  - .bmp位图文件
  - 储存文件
  - 现代文件系统
- 20.压缩
- - 无损压缩
  - - 游码编程
    - 紧凑法
  - 有损压缩
  - - 感知编码
    - 时间冗余

19.内存&存储介质

在这里插入图片描述

课程引出——

内存与存储器的区别

程序和数据是需要加载进内存，再进入CPU执行的，之前他们存在于存储器中。

We’ve talked about computer memory several times in this series,
系列中我们多次谈到内存（Memory）

and we even designed some in Episode 6.
甚至在第 6 集设计了一个简单内存
在这里插入图片描述

In general, computer memory is non-permanent.
一般来说，电脑内存是 “非永久性” 请添加图片描述
If your xbox accidently gets unplugged and turns off,
如果 Xbox 电源线不小心拔掉了，内存里所有数据都会丢失
any data saved in memory is lost.

For this reason, it’s called volatile memory.
所以内存叫**"易失性"存储器**

What we haven’t talked so much about this series is storage,
我们还没谈过的话题是存储器（Storage）

which is a tad different.
存储器（Storage）和内存（Memory）有点不同
请添加图片描述

Any data written to storage, like your hard drive,
任何写入"存储器"的数据，比如你的硬盘 \N 数据会一直存着，直到被覆盖或删除，断电也不会丢失
will stay there until it’s over-written or deleted, even if the power goes out.

It’s non-volatile.
**存储器是"非易失性"**的

It used to be that volatile memory was fast and non-volatile storage was slow,
以前是"易失性"的速度快，"非易失性"的速度慢

but as computing technologies have improved, this distinction is becoming less true,
但随着技术发展，两者的差异越来越小
and the terms have started to blend together.
请添加图片描述
Nowadays, we take for granted technologies like this little USB stick,
如今我们认为稀松平常的技术，比如这个 U 盘

which offers gigabytes of memory, reliable over long periods of time, all at low cost,
能低成本+可靠+长时间存储上 GB 的数据

纸带存储

以前是纸卡存储，发展到磁芯存储
If you loop a wire around this core.
在这里插入图片描述

磁芯存储

如果给磁芯绕上电线，并施加电流，可以将磁化在一个方向
and run an electrical current through the wire,
we can magnetize the core in a certain direction.
If we turn the current off, the core will stay magnetized.
如果关掉电流，磁芯保持磁化

If we pass current through the wire in the opposite direction,
如果沿相反方向施加电流
the magnetization direction, called polarity,
flips the other way.
In this way, we can store 1’s and 0’s!
这样就可以存 1 和 0！

在这里插入图片描述

1 bit of memory isn’t very useful, so these little donuts were arranged into grids.
如果只存 1 位不够有用，所以把小甜甜圈排列成网格

There were wires for selecting the right row and column, and a wire that ran through every core,
有电线负责选行和列 \N 也有电线贯穿每个磁芯, 用于读写一位(bit)

which could be used to read or write a bit.
Here is an actual piece of core memory!
我手上有一块磁芯存储器

In each of these little yellow squares, there are 32 rows and 32 columns of tiny cores,
每个黄色方格有32行x32列的磁芯 \N 每个磁芯存 1 位数据(面向比特编程doge
在这里插入图片描述

磁带、磁鼓存储

之后又出现了磁带、磁鼓技术，在发展到之后的磁盘（硬盘）
在这里插入图片描述

磁盘（硬盘）存储

The storage principle is the same,
原理是一样的，磁盘表面有磁性
so you can stack many of them together,
硬盘的好处是薄，可以叠在一起
在这里插入图片描述

providing a lot of surface area for data storage.
提供更多表面积来存数据

That’s exactly what IBM did for the world’s first computer with a disk drive:
IBM 对世上第一台磁盘计算机就是这样做的

软盘存储

I should also briefly mention a close cousin of hard disks, the floppy disk,
我简单地提一下硬盘的亲戚，软盘

which is basically the same thing, but uses a magnetic medium that’s, floppy.
除了磁盘是软的，其他基本一样
在这里插入图片描述

You might recognise it as the save icon on some of your applications,
你可能见过某些程序的保存图标是一个软盘

but it was once a real physical object!
软盘曾经是真实存在的东西！

It was most commonly used for portable storage,
软盘是为了便携，在 1970~1990 非常流行
在这里插入图片描述

And today it makes a pretty good coaster.
如今当杯垫挺不错的

光盘存储（CD&DVD）

owever, you are probably more familiar with its later, smaller, are more popular cousin,
你可能对后来的产品更熟：光盘（简称 CD）

the Compact Disk, or CD,
你可能对后来的产品更熟：光盘（简称 CD）

as well as the DVD which took off in the 90s.
以及 90 年代流行的 DVD
在这里插入图片描述

Functionally, these technologies are pretty similar to hard disks and floppy disks,
功能和硬盘软盘一样，都是存数据.

but instead of storing data magnetically,
但用的不是磁性

optical disks have little physical divots in their surface that cause light to be reflected differently,
光盘表面有很多小坑，造成光的不同反射

which is captured by an optical sensor, and decoded into 1’s and 0’s.
光学传感器会捕获到，并解码为 1 和 0
在这里插入图片描述

However, today, things are moving to solid state technologies, with no moving parts,
如今，存储技术在朝固态前进，没有机械活动部件

固态硬盘存储

like this hard drive and also this USB stick.
比如这个硬盘，以及 U 盘

Inside are Integrated Circuits,
里面是集成电路，我们在第 15 集讨论过

Today, costs have fallen so far,
如今成本下降了更多 \N 机械硬盘被固态硬盘逐渐替代，简称 SSD
that hard disk drives are being replaced with non-volatile,
Solid State Drives, or SSDs, as the cool kids say. 在这里插入图片描述

Because they contain no moving parts,
由于 SSD 没有移动部件

they don’t really have to seek anywhere,
磁头不用等磁盘转

so SSD access times are typically under 1/1000th of a second.
所以 SSD 访问时间低于 1/1000 秒

That’s fast!
这很快！
在这里插入图片描述

But it’s still many times slower than your computer’s RAM.
但还是比 RAM 慢很多倍

For this reason, computers today still use memory hierarchies.
所以现代计算机仍然用存储层次结构

20.文件系统

课程引出——

Last episode we talked about data storage, how technologies like magnetic tape and hard
上集我们讲了数据存储，磁带和硬盘这样的技术

disks can store trillions of bits of data,
可以在断电状态长时间存上万亿个位
for long durations, even without power.

Which is perfect for recording “big blobs” of related data,
非常合适存一整块有关系的数据，或者说**“文件”**
what are more commonly called computer files.

You’ve no doubt encountered many types,
你肯定见过很多种文件比如文本文件，音乐文件，照片和视频

like text files, music files, photos and videos.
Today, we’re going to talk about how files work,
今天，我们要讨论文件到底是什么？以及计算机怎么管理文件

文件格式

It’s perfectly legal for a file to contain arbitrary, unformatted data,
随意排列文件数据完全没问题，但按格式排会更好
but it’s most useful and practical if the data inside the file is organized somehow.

This is called a file format.
这叫 “文件格式”

You can invent your own, and programmers do that from time to time,
你可以发明自己的文件格式，程序员偶尔会这样做

but it’s usually best and easiest to use an existing standard, like JPEG and MP3.
但最好用现成标准，比如 JPEG 和 MP3

.txt文本文件

Let’s look at some simple file formats.
来看一些简单文件格式，最简单的是文本文件

ke all computer files, this is just a huge list of numbers, stored as binary.
就像所有其它文件，文本文件只是一长串二进制数

The key to interpreting this data is knowing that TXT files use ASCII,
解码数据的关键是 ASCII 编码

a character encoding standard we discussed way back in Episode 4.
一种字符编码标准，第 4 集讨论过.

So, in ASCII, our first value, 72, maps to the capital letter H.
第一个值 72 \N 在 ASCII 中是大写字母 H

And in this way, we decode the whole file.
以此类推解码其他数字

在这里插入图片描述

.wav 音频文件

Let’s look at a more complicated example: a WAVE File, also called a WAV,
来看一个更复杂的例子：波形(Wave)文件，也叫 WAV. 它存音频数据
which stores audio.
Before we can correctly read the data, we need to know some information,
在正确读取数据前，需要知道一些信息

like the bit rate and whether it’s a single track or stereo.
比如码率(bit rate)，以及是单声道还是立体声

Data about data, is called meta data.
关于数据的数据，叫"元数据"(meta data)

This metadata is stored at the front of the file, ahead of any actual data,
元数据存在文件开头，在实际数据前面 \N 因此也叫文件头(Header)
in what’s known as a Header.

Here’s what the first 44 bytes of a WAV file looks like.
WAV 文件的前 44 个字节长这样
在这里插入图片描述

Some parts are always the same, like where it spells out W-A-V-E.
有的部分总是一样的，比如写着 WAVE 的部分

Other parts contain numbers that change depending on the data contained within.
其他部分的内容，会根据数据变化

The audio data comes right behind the metadata, and it’s stored as a long list of numbers.
音频数据紧跟在元数据后面，是一长串数字

These values represent the amplitude of sound captured many times per second, and if you
数字代表每秒捕获多次的声音幅度

As an example, let’s look at a waveform of me saying: “hello!” Hello!
举个例子，看一下"你好"的波形
在这里插入图片描述

Now that we’ve captured some sound, let’s zoom into a little snippet.
现在捕获到了一些声音，我们放大看一下
在这里插入图片描述

A digital microphone, like the one in your computer or smartphone,
电脑和手机麦克风，每秒可以对声音进行上千次采样
samples the sound pressure thousands of times.

Each sample can be represented as a number.
每次采样可以用一个数字表示

Larger numbers mean higher sound pressure, what’s called amplitude.
声压越高数字越大，也叫**“振幅”**

And these numbers are exactly what gets stored in a WAVE file!
WAVE 文件里存的就是这些数据！

Thousands of amplitudes for every single second of audio!
每秒上千次的振幅！

When it’s time to play this file, an audio program needs to actuate the computer’s speakers
播放声音文件时，扬声器会产生相同的波形

such that the original waveform is emitted.

.bmp位图文件

So, now that you’re getting the hang of file formats, let’s talk about bitmaps or
现在来谈谈位图(Bitmap)，后缀 .bmp, 它存图片
BMP, which store pictures.

On a computer, Pictures are made up of little tiny square elements called pixels.
计算机上，图片由很多个叫"像素"的方块组成

Each pixel is a combination of three colors: red, green and blue.
每个像素由三种颜色组成：红，绿，蓝

These are called additive primary colors, and they can be mixed together to create any
叫"加色三原色"，混在一起可以创造其它颜色
other color on our electronic displays.

Now, just like WAV files, BMPs start with metadata,
就像 WAV 文件一样，BMP 文件开头也是元数据 ,有图片宽度，图片高度，颜色深度

including key values like image width, image height, and color depth.
As an example, let’s say the metadata specified an image 4 pixels wide, by 4 pixels tall,
举例，假设元数据说图是 4像素宽 x 4像素高
在这里插入图片描述

with a 24-bit color depth - that’s 8-bits for red, 8-bits for green, and 8-bits for blue.
颜色深度 24 位\N 8 位红色，8 位绿色，8 位蓝色

As a reminder, 8 bits is the same as one byte.
提醒一下，8位 (bit) 和 1字节(byte)是一回事

The smallest number a byte can store is 0, and the largest is 255.
一个字节能表示的最小数是 0，最大 255

Our image data is going to look something like this:
图像数据看起来会类似这样：\N 来看看第一个像素的颜色
Let’s look at the color of our first pixel.

It has 255 for its red value, 255 for green and 255 for blue.
红色是255，绿色是255，蓝色也是255

This equates to full intensity red, full intensity green and full intensity blue.
这等同于全强度红色，全强度绿色和全强度蓝色

These colors blend together on your computer monitor to become white.
混合在一起变成白色

So our first pixel is white!
所以第一个像素是白色！

The next pixel has a Red-Green-Blue, or RGB value of 255, 255, 0.
下一个像素的红绿蓝值，或 RGB 值 \N 255,255,0 是黄色！
That’s the color yellow!
The pixel after that has a RGB value of 0,0,0 - that’s zero intensity everything, which is black.
下一个像素是 0,0,0 ，黑色

And the next one is yellow.
下一个是黄色

Because the metadata specified this was a 4 by 4 image, we know that we’ve reached
因为元数据说图片是 4x4 \N 我们知道现在到了第一行结尾

the end of our first row of pixels.
因为元数据说图片是 4x4 \N 我们知道现在到了第一行结尾

So, we need to drop down a row.
所以换一行

The next RGB value is 255,255,0 yellow again.
下一个 RGB 值是 255,255,0，又是黄色

Okay, let’s go ahead and read all the pixels in our 4x4 image tada!
好，我们读完剩下的像素

A very low resolution pac-man!
一个低分辨率的吃豆人
在这里插入图片描述

储存文件

However, as computational power and storage capacity improved, it became possible, and
但随着计算能力和存储容量的提高 \N 存多个文件变得非常有用
useful, to store more than one file at a time.

The simplest option is to store files back-to-back.
最简单的方法是把文件连续存储

This can work… but how does the computer know where files begin and end?
这样能用, \N 但怎么知道文件开头和结尾在哪里？

Storage devices have no notion of files C they’re just a mechanism for storing lots of bits.
储存器没有文件的概念，只是存储大量位

So, for this to work, we need to have a special file that records where other ones are located.
所以为了存多个文件 \N 需要一个特殊文件，记录其他文件的位置

This goes by many names, but a good general term is Directory File.
这个特殊文件有很多名字，这里泛称 “目录文件”

Most often, it’s kept right at the front of storage, so we always know where to access it.
这个文件经常存在最开头，方便找

Location zero!
位置 0！

Inside the Directory File are the names of all the other files in storage.
目录文件里，存所有其他文件的名字

In our example, they each have a name, followed by a period
格式是文件名 + 一个句号 + 扩展名，比如 BMP 或 WAV
and end with what’s called a File Extension, like “BMP” or “WAV”.

Those further assist programs in identifying file types.
扩展名帮助得知文件类型

The Directory File also stores metadata about these files, like when they were created and
目录文件还存文件的元数据，比如创建时间

last modified, who the owner is, and if it can be read, written or both.
最后修改时间，文件所有者是谁\N是否能读/写或读写都行

But most importantly, the directory file contains where these files
最重要的是，目录文件有文件起始位置和长度
begin in storage, and how long they are.
在这里插入图片描述

If we want to add a file, remove a file, change a filename, or similar,
如果要添加文件，删除文件，更改文件名等

we have to update the information in the Directory File.
必须更新目录文件

It’s like the Table of Contents in a book, if you make a chapter shorter, or move it
就像书的目录，如果缩短或移动了一个章节 \N 要更新目录，不然页码对不上
somewhere else, you have to update the table of contents, otherwise the page numbers won’t match!

The Directory File, and the maintenance of it, is an example of a very basic File System,
目录文件，以及对目录文件的管理 \N 是一个非常简单的文件系统例子

the part of an Operating System that manages and keep track of stored files.
文件系统专门负责管理文件

This particular example is a called a Flat File System, because they’re all stored at one level.
刚刚的例子叫"平面文件系统" \N因为文件都在同一个层次

It’s flat!
平的！

Of course, packing files together, back-to-back, is a bit of a problem,
当然，把文件前后排在一起有个问题

because if we want to add some data to let’s say “todo.txt”,
如果给 todo.txt 加一点数据 \N 会覆盖掉后面 carrie.bmp 的一部分

there’s no room to do it without overwriting part of “carrie.bmp”.
如果给 todo.txt 加一点数据 \N 会覆盖掉后面 carrie.bmp 的一部分

So modern File Systems do two things.
所以现代文件系统会做两件事

First, they store files in blocks.
1.把空间划分成一块块 \N 导致有一些 “预留空间” 可以方便改动

This leaves a little extra space for changes, called slack space.

It also means that all file data is aligned to a common size, which simplifies management.
同时也方便管理

In a scheme like this, our Directory File needs to keep track of
用这样的方案，目录文件要记录文件在哪些块里
what block each one is stored in.

The second thing File Systems do, is allow files to be broken up into chunks
2. 拆分文件，存在多个块里
在这里插入图片描述

and stored across many blocks.
So let’s say we open “todo.txt”, and we add a few more items then the file becomes
假设打开 todo.txt 加了些内容\N 文件太大存不进一块里

too big to be saved in its one block.
We don’t want to overwrite the neighboring one, so instead, the File System allocates
我们不想覆盖掉隔壁的块，所以文件系统会分配 \N 一个没使用的块，容纳额外的数据
an unused block, which can accommodate extra data.
With a File System scheme like this, the Directory File needs to store
目录文件会记录不止一个块，而是多个块

not just one block per file, but rather a list of blocks per file.

In this way, we can have files of variable sizes that can be easily
只要分配块，文件可以轻松增大缩小

expanded and shrunk, simply by allocating and deallocating blocks.

If you watched our episode on Operating Systems, this should sound a lot like Virtual Memory.
如果你看了第18集操作系统 \N 这听起来很像"虚拟内存"

Conceptually it’s very similar!
概念上讲的确很像！

To do that, we can simply remove the entry from the Directory File.
假设想删掉 carrie.bmp \N 只需要在目录文件删掉那条记录

This, in turn, causes one block to become free.
让一块空间变成了可用

Note that we didn’t actually erase the file’s data in storage, we just deleted the record of it.
注意这里没有擦除数据，只是把记录删了

At some point, that block will be overwritten with new data, but until then, it just sits there.
之后某个时候，那些块会被新数据覆盖。但在此之前，数据还在原处

This is one way that computer forensic teams can “recover” data from computers even
所以计算机取证团队可以"恢复"数据

though people think it has been deleted. Crafty!
虽然别人以为数据已经"删了", 狡猾！

现代文件系统

Very quickly, it became impractical to store all files together at one level.
很快，所有文件都存在同一层变得不切实际

Just like documents in the real world, it’s handy to store related files together in folders.
就像现实世界\N 相关文件放在同一个文件夹会方便很多

Then we can put connected folders into folders, and so on.
然后文件夹套文件夹.

This is a Hierarchical File System, and its what your computer uses.
这叫"分层文件系统"，你的计算机现在就在用这个.
在这里插入图片描述

There are a variety of ways to implement this, but let’s stick with the File System example
实现方法有很多种，我们用之前的例子来讲重点好了

we’ve been using to convey the main idea.
The biggest change is that our Directory File needs to be able to point not just to files,
最大的变化是目录文件不仅要指向文件, 还要指向目录

but also other directories.
To keep track of what’s a file and what’s a directory, we need some extra metadata.
我们需要额外元数据来区分开文件和目录，

This Directory File is the top-most one, known as the Root Directory.
这个目录文件在最顶层，因此叫根目录

All other files and folders lie beneath this directory along various file paths.
所有其他文件和文件夹，都在根目录下

在这里插入图片描述

So that’s a quick overview of the key principles of File Systems.
文件系统的几个重要概念现在介绍完了.

They provide yet another way to move up a new level of abstraction.
它提供了一层新抽象！

File systems allow us to hide the raw bits stored on magnetic tape, spinning disks and
文件系统使我们不必关心 \N 文件在磁带或磁盘的具体位置

the like, and they let us think of data as neatly organized and easily accessible files.
整理和访问文件更加方便

We even started talking about users, not programmers, manipulating data,
我们像普通用户一样直观操纵数据，比如打开和整理文件

like opening files and organizing them,

foreshadowing where the series will be going in a few episodes.
接下来几集也会从用户角度看问题

I’ll see you next week.
下周见

20.压缩

在这里插入图片描述

Last episode we talked about Files, bundles of data, stored on a computer, that
上集我们讨论了文件格式，如何编码文字，声音，图片
are formatted and arranged to encode information, like text, sound or images.

We even discussed some basic file formats, like text, wave, and bitmap.
还举了具体例子 .txt .wav .bmp

While these formats are perfectly fine and still used today,
这些格式虽然管用，而且现在还在用 \N 但它们的简单性意味着效率不高
their simplicity also means they’re not very efficient.

Ideally, we want files to be as small as possible, so we can store lots of them without filling
我们希望文件能小一点，这样能存大量文件，传输也会快一些
up our hard drives, and also transmit them more quickly.

Nothing is more frustrating than waiting for an email attachment to download. Ugh!
等邮件附件下载烦死人了

The answer is compression, which literally squeezes data into a smaller size.
解决方法是压缩，把数据占用的空间压得更小

To do this, we have to encode data using fewer bits than the original representation.
用更少的位(bit)来表示数据
在这里插入图片描述

无损压缩

减少重复信息——

游码编程

适合经常出现相同值的情景，比如上面的像素块在这里插入图片描述
就可以变成这样——

最后就是这样的样式——

We’re now at 24 bytes, down from 48.
我们大大减少了字节数，之前是48 现在是24

That’s 50% smaller!
小了50％！省了很多空间！

紧凑法

To do this, we need a dictionary that stores the mapping from codes to data.
为此，我们需要一个字典，存储"代码"和"数据"间的对应关系

Lets see how this works for our example.
我们看个例子

We can view our image as not just a string of individual pixels,
我们可以把图像看成一块块，而不是一个个像素
but as little blocks of data.

For simplicity, we’re going to use pixel pairs, which are 6 bytes long,
为了简单，我们把2个像素当成1块（占6个字节）

but blocks can be any size.
但你也可以定成其他大小

In our example, there are only four pairings: White-yellow, black-yellow,
还是上面的像素图块例子，我们只有四对：白黄黑黄黄黄白白
yellow-yellow and white-white.
在这里插入图片描述

Those are the data blocks in our dictionary we want to generate compact codes for.
我们会为这四对生成紧凑代码(compact codes)

What’s interesting, is that these blocks occur at different frequencies.
有趣的是，这些块的出现频率不同

One method for generating efficient codes is building a Huffman Tree, invented by David
1950年代大卫·霍夫曼发明了一种高效编码方式叫 \N “霍夫曼树”（Huffman Tree）当时他是麻省理工学院的学生
Huffman while he was a student at MIT in the 1950s.

His algorithm goes like this.
算法是这样的

First, you layout all the possible blocks and their frequencies.
首先，列出所有块和出现频率，每轮选两个最低的频率

At every round, you select the two with the lowest frequencies.
Here, that’s Black-Yellow and White-White, each with a frequency of 1.
这里黑黄和白白的频率最低，它们都是 1

You combine these into a little tree. which have a combined frequency of 2,
可以把它们组成一个树，总频率 2
so we record that.
And now one step of the algorithm done.
现在完成了一轮算法

Now we repeat the process.
现在我们重复这样做

This time we have three things to choose from.
这次有3个可选

Just like before, we select the two with the lowest frequency, put them into a little tree,
就像上次一样，选频率最低的两个，放在一起，并记录总频率

and record the new total frequency of all the sub items.

Ok, we’re almost done.
好，我们快完成了

This time it’s easy to select the two items with the lowest frequency
这次很简单，因为只有2个选择

because there are only two things left to pick.
We combine these into a tree, and now we’re done!
把它们组合成一棵树就完成了！
在这里插入图片描述

Our tree looks like this, and it has a very cool property: it’s arranged by frequency,
现在看起来像这样，它有一个很酷的属性：按频率排列

with less common items lower down.
频率低的在下面

So, now we have a tree, but you may be wondering how this gets us to a dictionary.
现在有了一棵树，你可能在想 “怎么把树变成字典？”

Well, we use our frequency-sorted tree to generate the codes we need
我们可以把每个分支用 0 和 1 标注，就像这样

在这里插入图片描述

by labeling each branch with a 0 or a 1, like so.

With this, we can write out our code dictionary.
现在可以生成字典

Yellow-yellow is encoded as just a single 0. White-yellow is encoded as 10
黄黄编码成 0 \N 白黄编码成 10 \N 黑黄编码成 110 \N 白白编码成 111

Black-Yellow is 1 1 0. and finally white-white is 1 1 1.

The really cool thing about these codewords is that there’s no way to
酷的地方是它们绝对不会冲突

have conflicting codes, because each path down the tree is unique.
因为树的每条路径是唯一的

This means our codes are prefix-free, that is no code starts with another complete code.
意味着代码是**“无前缀”**的，没有代码是以另一个代码开头的

Now, let’s return to our image data and compress it!
现在我们来压缩！

在这里插入图片描述

This data is meaningless unless we also save our code dictionary.
字典也要保存下来，否则 14 bit 毫无意义

So, we’ll need to append it to the front of the image data, like this.
所以我们把字典加到 14 bit 前面，就像这样

Now, including the dictionary, our image data is 30 bytes long.
现在加上字典，图像是 30 个字节(bytes) \N 比 48 字节好很多
在这里插入图片描述

The two approaches we discussed,
“消除冗余"和"用更紧凑的表示方法”，这两种方法通常会组合使用
removing redundancies and using more compact representations, are often combined,

and underlie almost all lossless compressed file formats,
几乎所有无损压缩格式都用了它们 \N 比如 GIF, PNG, PDF, ZIP

like GIF, PNG, PDF and ZIP files.
Both run-length encoding and dictionary coders are lossless compression techniques.
游程编码和字典编码都是无损压缩

No information is lost; when you decompress, you get the original file.
压缩时不会丢失信息，解压后，数据和之前完全一样

有损压缩

But, there are other types of files where we can get away with little changes, perhaps
但其他一些文件，丢掉一些数据没什么关系

by removing unnecessary or less important information, especially information
丢掉那些人类看不出区别的数据

that human perception is not good at detecting.
丢掉那些人类看不出区别的数据

And this trick underlies most lossy compression techniques.
大多数有损压缩技术，都用到了这点

These tend to be pretty complicated, so we’re going to attack this at a conceptual level.
实际细节比较复杂，所以我们讲概念就好

Let’s take sound as an example.
以声音为例，你的听力不是完美的

Your hearing is not perfect.
We can hear some frequencies of sound better than others.
有些频率我们很擅长，其他一些我们根本听不见，比如超声波

And there are some we can’t hear at all, like ultrasound.
Unless you’re a bat.
除非你是蝙蝠

Basically, if we make a recording of music, and there’s data in the ultrasonic frequency range,
举个例子，如果录音乐，超声波数据都可以扔掉 \N 因为人类听不到超声波

we can discard it, because we know that humans can’t hear it.
On the other hand, humans are very sensitive to frequencies in the vocal range, like people
另一方面，人类对人声很敏感，所以应该尽可能保持原样

singing, so it’s best to preserve quality there as much as possible.
Deep bass is somewhere in between.
低音介于两者之间，人类听得到，但不怎么敏感

Humans can hear it, but we’re less attuned to it.
We mostly sense it.
一般是感觉到震动

Lossy audio compressors takes advantage of this, and encode different
有损音频压缩利用这一点，用不同精度编码不同频段

frequency bands at different precisions.
Even if the result is rougher, it’s likely that users won’t perceive the difference.
听不出什么区别，不会明显影响体验

感知编码

This idea of discarding or reducing precision in a manner that aligns with human perception
这种删掉人类无法感知的数据的方法，叫"感知编码"

is called perceptual coding,
and it relies on models of human perception,
它依赖于人类的感知模型，模型来自"心理物理学"领域

Let’s look at a patch of 8x8 pixels.
我们来看其中一个 8x8 像素

Pretty much every pixel is different from its neighbor,
几乎每个像素都和相邻像素不同，用无损技术很难压缩 \N 因为太多不同点了

making it hard to compress with loss-less techniques because there’s just a lot going on.
Lots of little details.
很多小细节

But human perception doesn’t register all those details.
但人眼看不出这些细节

So, we can discard a lot of that detail, and replace it with a simplified patch like this.
因此可以删掉很多，用这样一个简单的块来代替
在这里插入图片描述

时间冗余

视频可以做一些小技巧 \N 因为帧和帧之间很多像素一样

Like this whole background behind me!
比如我后面的背景！
在这里插入图片描述

This is called temporal redundancy.
这叫时间冗余

We don’t need to re-transmit those pixels every frame of the video.
视频里不用每一帧都存这些像素 \N 可以只存变了的部分

We can just copy patches of data forward.
视频里不用每一帧都存这些像素 \N 可以只存变了的部分

When there are small pixel differences, like the readout on this frequency generator behind me,
当帧和帧之间有小小的差异时，比如后面这个频率发生器

most video formats send data that encodes just the difference between patches,
很多视频编码格式，只存变化的部分

which is more efficient than re-transmitting all the pixels afresh, again taking advantage
这比存所有像素更有效率 \N 利用了帧和帧之间的相似性

The fanciest video compression formats go one step further.
更高级的视频压缩格式会更进一步

They find patches that are similar between frames, and not only copy them forward, with
找出帧和帧之间相似的补丁 \N 然后用简单效果实现，比如移动和旋转

or without differences, but also can apply simple effects to them, like a shift or rotation.
They can also lighten or darken a patch between frames.
变亮和变暗

So, if I move my hand side to side like this the video compressor will identify the similarity,
如果我这样摆手，视频压缩器会识别到相似性

capture my hand in one or more patches, then just move these patches around between frames.
用一个或多个补丁代表我的手，然后帧之间直接移动这些补丁

You’re actually seeing my hand from the past kinda freaky, but it uses a lot less data.
所以你看到的是我过去的手（不是实时的）\N 有点可怕但数据量少得多

MPEG-4 videos, a common standard, are often 20 to 200 times
MPEG-4 是常见标准，可以比原文件小20倍到200倍

smaller than the original, uncompressed file.
However, encoding frames as translations and rotations of patches from previous frames
但用补丁的移动和旋转来更新画面

can go horribly wrong when you compress too heavily, and there isn’t
当压缩太严重时会出错 \N 没有足够空间更新补丁内的像素

enough space to update pixel data inside of the patches.
当压缩太严重时会出错 \N 没有足够空间更新补丁内的像素

The video player will forge ahead, applying the right motions,
即使补丁是错的，视频播放器也会照样播放

even if the patch data is wrong.
And this leads to some hilarious and trippy effects, which I’m sure you’ve seen.
导致一些怪异又搞笑的结果，你肯定见过这些.
在这里插入图片描述
变成这样——