Merge pull request #12885 from HankChow/master

翻译完成 20180719 Building tiny container images.md
This commit is contained in:
Xingyu.Wang 2019-03-24 13:05:52 +08:00 committed by GitHub
commit 0c3e599bfd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 356 additions and 364 deletions

View File

@ -1,364 +0,0 @@
hankchow translating
Building tiny container images
======
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/containers_scale_performance.jpg?itok=R7jyMeQf)
When [Docker][1] exploded onto the scene a few years ago, it brought containers and container images to the masses. Although Linux containers existed before then, Docker made it easy to get started with a user-friendly command-line interface and an easy-to-understand way to build images using the Dockerfile format. But while it may be easy to jump in, there are still some nuances and tricks to building container images that are usable, even powerful, but still small in size.
### First pass: Clean up after yourself
Some of these examples involve the same kind of cleanup you would use with a traditional server, but more rigorously followed. Smaller image sizes are critical for quickly moving images around, and storing multiple copies of unnecessary data on disk is a waste of resources. Consequently, these techniques should be used more regularly than on a server with lots of dedicated storage.
An example of this kind of cleanup is removing cached files from an image to recover space. Consider the difference in size between a base image with [Nginx][2] installed by `dnf` with and without the metadata and yum cache cleaned up:
```
# Dockerfile with cache
FROM fedora:28
LABEL maintainer Chris Collins <collins.christopher@gmail.com>
RUN dnf install -y nginx
-----
# Dockerfile w/o cache
FROM fedora:28
LABEL maintainer Chris Collins <collins.christopher@gmail.com>
RUN dnf install -y nginx \
        && dnf clean all \
        && rm -rf /var/cache/yum
-----
[chris@krang] $ docker build -t cache -f Dockerfile .  
[chris@krang] $ docker images --format "{{.Repository}}: {{.Size}}"
| head -n 1
cache: 464 MB
[chris@krang] $ docker build -t no-cache -f Dockerfile-wo-cache .
[chris@krang] $ docker images --format "{{.Repository}}: {{.Size}}"  | head -n 1
no-cache: 271 MB
```
That is a significant difference in size. The version with the `dnf` cache is almost twice the size of the image without the metadata and cache. Package manager cache, Ruby gem temp files, `nodejs` cache, even downloaded source tarballs are all perfect candidates for cleaning up.
### Layers—a potential gotcha
Unfortunately (or fortunately, as youll see later), based on the way layers work with containers, you cannot simply add a `RUN rm -rf /var/cache/yum` line to your Dockerfile and call it a day. Each instruction of a Dockerfile is stored in a layer, with changes between layers applied on top. So even if you were to do this:
```
RUN dnf install -y nginx
RUN dnf clean all
RUN rm -rf /var/cache/yum
```
...youd still end up with three layers, one of which contains all the cache, and two intermediate layers that "remove" the cache from the image. But the cache is actually still there, just as when you mount a filesystem over the top of another one, the files are there—you just cant see or access them.
Youll notice that the example in the previous section chains the cache cleanup in the same Dockerfile instruction where the cache is generated:
```
RUN dnf install -y nginx \
        && dnf clean all \
        && rm -rf /var/cache/yum
```
This is a single instruction and ends up being a single layer within the image. Youll lose a bit of the Docker (*ahem*) cache this way, making a rebuild of the image slightly longer, but the cached data will not end up in your final image. As a nice compromise, just chaining related commands (e.g., `yum install` and `yum clean all`, or downloading, extracting and removing a source tarball, etc.) can save a lot on your final image size while still allowing you to take advantage of the Docker cache for quicker development.
This layer "gotcha" is more subtle than it first appears, though. Because the image layers document the _changes_ to each layer, one upon another, its not just the existence of files that add up, but any change to the file. For example, _even changing the mode_ of the file creates a copy of that file in the new layer.
For example, the output of `docker images` below shows information about two images. The first, `layer_test_1`, was created by adding a single 1GB file to a base CentOS image. The second image, `layer_test_2`, was created `FROM layer_test_1` and did nothing but change the mode of the 1GB file with `chmod u+x`.
```
layer_test_2        latest       e11b5e58e2fc           7 seconds ago           2.35 GB
layer_test_1        latest       6eca792a4ebe           2 minutes ago           1.27 GB
```
As you can see, the new image is more than 1GB larger than the first. Despite the fact that `layer_test_1` is only the first two layers of `layer_test_2`, theres still an extra 1GB file floating around hidden inside the second image. This is true anytime you remove, move, or change any file during the image build process.
### Purpose-built images vs. flexible images
An anecdote: As my office heavily invested in [Ruby on Rails][3] applications, we began to embrace the use of containers. One of the first things we did was to create an official Ruby base image for all of our teams to use. For simplicitys sake (and suffering under “this is the way we did it on our servers”), we used [rbenv][4] to install the latest four versions of Ruby into the image, allowing our developers to migrate all of their applications into containers using a single image. This resulted in a very large but flexible (we thought) image that covered all the bases of the various teams we were working with.
This turned out to be wasted work. The effort required to maintain separate, slightly modified versions of a particular image was easy to automate, and selecting a specific image with a specific version actually helped to identify applications approaching end-of-life before a breaking change was introduced, wreaking havoc downstream. It also wasted resources: When we started to split out the different versions of Ruby, we ended up with multiple images that shared a single base and took up very little extra space if they coexisted on a server, but were considerably smaller to ship around than a giant image with multiple versions installed.
That is not to say building flexible images is not helpful, but in this case, creating purpose-build images from a common base ended up saving both storage space and maintenance time, and each team could modify their setup however they needed while maintaining the benefit of the common base image.
### Start without the cruft: Add what you need to a blank image
As friendly and easy-to-use as the _Dockerfile_ is, there are tools available that offer the flexibility to create very small Docker-compatible container images without the cruft of a full operating system—even those as small as the standard Docker base images.
[Ive written about Buildah before][5], and Ill mention it again because it is flexible enough to create an image from scratch using tools from your host to install packaged software and manipulate the image. Those tools then never need to be included in the image itself.
Buildah replaces the `docker build` command. With it, you can mount the filesystem of your container image to your host machine and interact with it using tools from the host.
Lets try Buildah with the Nginx example from above (ignoring caches for now):
```
#!/usr/bin/env bash
set -o errexit
# Create a container
container=$(buildah from scratch)
# Mount the container filesystem
mountpoint=$(buildah mount $container)
# Install a basic filesystem and minimal set of packages, and nginx
dnf install --installroot $mountpoint  --releasever 28 glibc-minimal-langpack nginx --setopt install_weak_deps=false -y
# Save the container to an image
buildah commit --format docker $container nginx
# Cleanup
buildah unmount $container
# Push the image to the Docker daemons storage
buildah push nginx:latest docker-daemon:nginx:latest
```
Youll notice were no longer using a Dockerfile to build the image, but a simple Bash script, and were building it from a scratch (or blank) image. The Bash script mounts the containers root filesystem to a mount point on the host, and then uses the hosts command to install the packages. This way the package manager doesnt even have to exist inside the container.
Without extra cruft—all the extra stuff in the base image, like `dnf`, for example—the image weighs in at only 304 MB, more than 100 MB smaller than the Nginx image built with a Dockerfile above.
```
[chris@krang] $ docker images |grep nginx
docker.io/nginx      buildah      2505d3597457    4 minutes ago         304 MB
```
_Note: The image name has`docker.io` appended to it due to the way the image is pushed into the Docker daemons namespace, but it is still the image built locally with the build script above._
That 100 MB is already a huge savings when you consider a base image is already around 300 MB on its own. Installing Nginx with a package manager brings in a ton of dependencies, too. For something compiled from source using tools from the host, the savings can be even greater because you can choose the exact dependencies and not pull in any extra files you dont need.
If youd like to try this route, [Tom Sweeney][6] wrote a much more in-depth article, [Creating small containers with Buildah][7], which you should check out.
Using Buildah to build images without a full operating system and included build tools can enable much smaller images than you would otherwise be able to create. For some types of images, we can take this approach even further and create images with _only_ the application itself included.
### Create images with only statically linked binaries
Following the same philosophy that leads us to ditch administrative and build tools inside images, we can go a step further. If we specialize enough and abandon the idea of troubleshooting inside of production containers, do we need Bash? Do we need the [GNU core utilities][8]? Do we _really_ need the basic Linux filesystem? You can do this with any compiled language that allows you to create binaries with [statically linked libraries][9]—where all the libraries and functions needed by the program are copied into and stored within the binary itself.
This is a relatively popular way of doing things within the [Golang][10] community, so well use a Go application to demonstrate.
The Dockerfile below takes a small Go Hello-World application and compiles it in an image `FROM golang:1.8`:
```
FROM golang:1.8
ENV GOOS=linux
ENV appdir=/go/src/gohelloworld
COPY ./ /go/src/goHelloWorld
WORKDIR /go/src/goHelloWorld
RUN go get
RUN go build -o /goHelloWorld -a
CMD ["/goHelloWorld"]
```
The resulting image, containing the binary, the source code, and the base image layer comes in at 716 MB. The only thing we actually need for our application is the compiled binary, however. Everything else is unused cruft that gets shipped around with our image.
If we disable `cgo` with `CGO_ENABLED=0` when we compile, we can create a binary that doesnt wrap C libraries for some of its functions:
```
GOOS=linux CGO_ENABLED=0 go build -a goHelloWorld.go
```
The resulting binary can be added to an empty, or "scratch" image:
```
FROM scratch
COPY goHelloWorld /
CMD ["/goHelloWorld"]
```
Lets compare the difference in image size between the two:
```
[ chris@krang ] $ docker images
REPOSITORY      TAG             IMAGE ID                CREATED                 SIZE
goHello     scratch     a5881650d6e9            13 seconds ago          1.55 MB
goHello     builder     980290a100db            14 seconds ago          716 MB
```
Thats a huge difference. The image built from `golang:1.8` with the `goHelloWorld` binary in it (tagged "builder" above) is _460_ times larger than the scratch image with just the binary. The entirety of the scratch image with the binary is only 1.55 MB. That means wed be shipping around 713 MB of unnecessary data if we used the builder image.
As mentioned above, this method of creating small images is used often in the Golang community, and there is no shortage of blog posts on the subject. [Kelsey Hightower][11] wrote [an article on the subject][12] that goes into more detail, including dealing with dependencies other than just C libraries.
### Consider squashing, if it works for you
Theres an alternative to chaining all the commands into layers in an attempt to save space: Squashing your image. When you squash an image, youre really exporting it, removing all the intermediate layers, and saving a single layer with the current state of the image. This has the advantage of reducing that image to a much smaller size.
Squashing layers used to require some creative workarounds to flatten an image—exporting the contents of a container and re-importing it as a single layer image, or using tools like `docker-squash`. Starting in version 1.13, Docker introduced a handy flag, `--squash`, to accomplish the same thing during the build process:
```
FROM fedora:28
LABEL maintainer Chris Collins <collins.christopher@gmail.com>
RUN dnf install -y nginx
RUN dnf clean all
RUN rm -rf /var/cache/yum
[chris@krang] $ docker build -t squash -f Dockerfile-squash --squash .
[chris@krang] $ docker images --format "{{.Repository}}: {{.Size}}"  | head -n 1
squash: 271 MB
```
Using `docker squash` with this multi-layer Dockerfile, we end up with another 271MB image, as we did with the chained instruction example. This works great for this use case, but theres a potential gotcha.
“What? ANOTHER gotcha?”
Well, sort of—its the same issue as before, causing problems in another way.
### Going too far: Too squashed, too small, too specialized
Images can share layers. The base may be _x_ megabytes in size, but it only needs to be pulled/stored once and each image can use it. The effective size of all the images sharing layers is the base layers plus the diff of each specific change on top of that. In this way, thousands of images may take up only a small amount more than a single image.
This is a drawback with squashing or specializing too much. When you squash an image into a single layer, you lose any opportunity to share layers with other images. Each image ends up being as large as the total size of its single layer. This might work well for you if you use only a few images and run many containers from them, but if you have many diverse images, it could end up costing you space in the long run.
Revisiting the Nginx squash example, we can see its not a big deal for this case. We end up with Fedora, Nginx installed, no cache, and squashing that is fine. Nginx by itself is not incredibly useful, though. You generally need customizations to do anything interesting—e.g., configuration files, other software packages, maybe some application code. Each of these would end up being more instructions in the Dockerfile.
With a traditional image build, you would have a single base image layer with Fedora, a second layer with Nginx installed (with or without cache), and then each customization would be another layer. Other images with Fedora and Nginx could share these layers.
Need an image:
```
[   App 1 Layer (  5 MB) ]          [   App 2 Layer (6 MB) ]
[   Nginx Layer ( 21 MB) ] ------------------^
[ Fedora  Layer (249 MB) ]  
```
But if you squash the image, then even the Fedora base layer is squashed. Any squashed image based on Fedora has to ship around its own Fedora content, adding another 249 MB for _each image!_
```
[ Fedora + Nginx + App 1 (275 MB)]      [ Fedora + Nginx + App 2 (276 MB) ]  
```
This also becomes a problem if you build lots of highly specialized, super-tiny images.
As with everything in life, moderation is key. Again, thanks to how layers work, you will find diminishing returns as your container images become smaller and more specialized and can no longer share base layers with other related images.
Images with small customizations can share base layers. As explained above, the base may be _x_ megabytes in size, but it only needs to be pulled/stored once and each image can use it. The effective size of all the images is the base layers plus the diff of each specific change on top of that. In this way, thousands of images may take up only a small amount more than a single image.
```
[ specific app   ]      [ specific app 2 ]
[ customizations ]--------------^
[ base layer     ]
```
If you go too far with your image shrinking and you have too many variations or specializations, you can end up with many images, none of which share base layers and all of which take up their own space on disk.
```
 [ specific app 1 ]     [ specific app 2 ]      [ specific app 3 ]
```
### Conclusion
There are a variety of different ways to reduce the amount of storage space and bandwidth you spend working with container images, but the most effective way is to reduce the size of the images themselves. Whether you simply clean up your caches (avoiding leaving them orphaned in intermediate layers), squash all your layers into one, or add only static binaries in an empty image, its worth spending some time looking at where bloat might exist in your container images and slimming them down to an efficient size.
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/7/building-container-images
作者:[Chris Collins][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/clcollins
[1]:https://www.docker.com/
[2]:https://www.nginx.com/
[3]:https://rubyonrails.org/
[4]:https://github.com/rbenv/rbenv
[5]:https://opensource.com/article/18/6/getting-started-buildah
[6]:https://twitter.com/TSweeneyRedHat
[7]:https://opensource.com/article/18/5/containers-buildah
[8]:https://www.gnu.org/software/coreutils/coreutils.html
[9]:https://en.wikipedia.org/wiki/Static_library
[10]:https://golang.org/
[11]:https://twitter.com/kelseyhightower
[12]:https://medium.com/@kelseyhightower/optimizing-docker-images-for-static-binaries-b5696e26eb07

View File

@ -0,0 +1,356 @@
如何打造更小巧的容器镜像
======
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/containers_scale_performance.jpg?itok=R7jyMeQf)
[Docker][1] 近几年的爆炸性发展让大家逐渐了解到容器和容器镜像的概念。尽管 Linux 容器技术在很早之前就已经出现,这项技术近来的蓬勃发展却还是要归功于 Docker 对用户友好的命令行界面以及使用 Dockerfile 轻松构建镜像的方式。纵然 Docker 大大降低了入门容器技术的难度,但构建一个兼具功能强大、体积小巧的容器镜像的过程中,有很多技巧需要了解。
### 清理不必要的文件
这一步和在普通服务器上清理文件没有太大的区别,而且要清理得更加仔细。一个小体积的容器镜像在传输方面有很大的优势,同时,在磁盘上存储大量不必要的数据也是对资源的一种浪费。这个原则对于所有服务器来说都是合适的。
清理容器镜像中的缓存文件可以有效缩小镜像体积。下面的对比是分别使用 `dnf` 安装 [Nginx][2] 构建的镜像和使用 `yum` 安装 Nginx 后清理缓存文件构建的镜像:
```
# Dockerfile with cache
FROM fedora:28
LABEL maintainer Chris Collins <collins.christopher@gmail.com>
RUN dnf install -y nginx
-----
# Dockerfile w/o cache
FROM fedora:28
LABEL maintainer Chris Collins <collins.christopher@gmail.com>
RUN dnf install -y nginx \
&& dnf clean all \
&& rm -rf /var/cache/yum
-----
[chris@krang] $ docker build -t cache -f Dockerfile .
[chris@krang] $ docker images --format "{{.Repository}}: {{.Size}}"
| head -n 1
cache: 464 MB
[chris@krang] $ docker build -t no-cache -f Dockerfile-wo-cache .
[chris@krang] $ docker images --format "{{.Repository}}: {{.Size}}" | head -n 1
no-cache: 271 MB
```
从上面的结果来看,清理缓存文件的效果相当显著。和清除了 `yum` 缓存文件的容器镜像相比,不清除 `dnf` 缓存文件构建出来的容器镜像体积接近前者的两倍。除此以外包管理器缓存文件、Ruby gem 临时文件、`nodejs` 缓存文件,甚至是下载的源码 tarball 最好都全部清理掉。
### 层:一个潜在的隐患
很不幸(当你往下读,你会发现这是不幸中的万幸),根据容器层的概念,不能简单地向 Dockerfile 中写一句 `RUN rm -rf /var/cache/yum` 就完事儿了。因为 Dockerfile 的每一条命令都以一个层的形式存储,并一层层地叠加。所以,如果你是这样写的:
```
RUN dnf install -y nginx
RUN dnf clean all
RUN rm -rf /var/cache/yum
```
你的容器镜像就会包含三层,而 `RUN dnf install -y nginx` 这一层仍然会保留着那些缓存文件,然后在另外两层中被移除,但缓存仍然是存在的,只是你在最终的容器镜像中见不到它们而已。
在上一节的示例中,你会看到正确的做法是将几条命令连接起来,在产生缓存文件的同一层里把缓存文件清理掉:
```
RUN dnf install -y nginx \
&& dnf clean all \
&& rm -rf /var/cache/yum
```
这样就把几条命令连成了一条命令,在最终的镜像中只占用一个层。这样只会稍微多耗费一点点构建容器镜像的时间,但被清理掉的缓存文件就不会留存在最终的镜像中了。这是一个很好的折中方法,只需要把一些相关的命令(例如 `yum install``yum clean all`、下载文件、解压文件、移除 tarball 等等连接成一个命令就可以在最终的容器镜像中节省出大量体积Docker 层也能更好地发挥作用。
层还有一个更隐蔽的特性。每一层都记录了文件的更改,这里的更改并不仅仅指文件是否存在、文件内容是否被改动,而是包括文件属性在内的所有更改。因此即使是对文件使用了 `chmod` 操作也会被记录在层中。
下面是一次 `docker images` 命令的输出内容。其中容器镜像 `layer_test_1` 是仅在 CentOS 基础镜像中增加了一个 1GB 大小的文件后构建出来的镜像,而容器镜像 `layer_test_2` 是使用了 `FROM layer_test_1` 语句,仅仅再执行一条 `chmod u+x` 命令后构建出来的镜像。
```
layer_test_2 latest e11b5e58e2fc 7 seconds ago 2.35 GB
layer_test_1 latest 6eca792a4ebe 2 minutes ago 1.27 GB
```
如你所见,`layer_test_2` 镜像比 `layer_test_1` 镜像大了 1GB 以上。尽管 `layer_test_2` 基于 `layer_test_1` 且只比 `layer_test_1` 多出一层,但恰好就在这多出来的一层中包含了一个额外的 1GB 的文件。在构建容器镜像的过程中,如果在单独一层中进行移动、更改、删除文件,都会出现类似的结果。
### 专用镜像和公用镜像
就有这么一个亲身经历:我们的项目重度依赖于 [Ruby on Rails][3],于是我们开始使用容器。一开始我们就建立了一个 Ruby 的基础镜像供所有的团队使用,为了简单起见(实际上这样并不好),我们使用 [rbenv][4] 将 Ruby 最新的 4 个版本都安装到了这个镜像当中,目的是让开发人员只用这个镜像就可以将使用不同版本 Ruby 的应用程序迁移到容器中。我们当时还认为这是一个有点大但兼容性相当好的镜像,因为这个镜像可以同时满足各个团队的使用。
实际上这是费力不讨好的。如果将不同版本应用程序安装在独立的镜像中,可以很轻松地实现镜像的自动化维护。同时,还可以在应用程序接近生命周期结束前提前做好预防措施,以免产生不可控的后果。庞大的公用镜像也会对资源造成浪费,当我们后来将这个庞大的镜像按照 Ruby 版本进行拆分之后,每个基于这些基础镜像构建出来的应用镜像体积都会比原来缩小很多,节省了大量的服务器存储资源。
这个例子也不是说不能追求镜像的兼容性,但仅对于这个例子来说,分拆成多个小的专用镜像无疑能够节省存储资源和维护成本,同时不同的团队也能够根据特定的需求来做定制化的配置。
### 从零开始:将你需要的内容添加到空白镜像中
有一些和 Dockerfile 一样易用的工具可以轻松创建非常小的兼容 Docker 的容器镜像,这些镜像甚至不需要包含一个完整的操作系统。
我曾经写过一篇[关于 Buildah 的文章][5],我想在这里再一次安利这个工具。它可以使用宿主机上的工具来操作一个空白镜像并安装打包好的应用程序,而且这些工具不会被包含到镜像当中。
Buildah 取代了 `docker build` 命令。可以使用 Buildah 将容器的文件系统挂载到宿主机上并进行交互。
下面来使用 Buildah 实现上文中 Nginx 的例子:
```
#!/usr/bin/env bash
set -o errexit
# Create a container
container=$(buildah from scratch)
# Mount the container filesystem
mountpoint=$(buildah mount $container)
# Install a basic filesystem and minimal set of packages, and nginx
dnf install --installroot $mountpoint --releasever 28 glibc-minimal-langpack nginx --setopt install_weak_deps=false -y
# Save the container to an image
buildah commit --format docker $container nginx
# Cleanup
buildah unmount $container
# Push the image to the Docker daemons storage
buildah push nginx:latest docker-daemon:nginx:latest
```
你会发现这里使用的已经不是 Dockerfile 了,而是普通的 Bash 脚本,而且是从空白镜像开始构建的。上面这段 Bash 脚本将容器的根文件系统挂载到了宿主机上,然后使用宿主机的命令来安装及应用程序,这样的话就不需要把软件包管理器放置到容器镜像中了。
这样所有无关的内容(例如 `dnf`)就不再会包含在镜像中了。在这个例子当中,构建出来的镜像大小只有 304 MB比使用 Dockerfile 构建的镜像减少了 100 MB 以上。
```
[chris@krang] $ docker images |grep nginx
docker.io/nginx buildah 2505d3597457 4 minutes ago 304 MB
```
_注这个镜像是使用上面的构建脚本构建的镜像名称中的 `docker.io` 只是在推送到镜像仓库时人为加上的。_
对于一个 300MB 级别的容器镜像来说,能缩小 100MB 已经是很显著的节省了。使用软件包管理器来安装 Nginx 会带来大量的依赖项,如果能够使用宿主机直接从源代码对应用程序进行编译然后构建到容器镜像中,节省出来的空间还可以更多,因为这个时候可以精细的选用必要的依赖项,非必要的依赖项一概不构建到镜像中。
[Tom Sweeney][6] 有一篇文章《[用 Buildah 构建更小的容器][7]》,如果你想在这方面做深入的优化,不妨参考一下。
通过 Buildah 可以构建一个不包含完整操作系统和代码编译工具的容器镜像,大幅缩减了容器镜像的体积。因此可以采用这种方式,创建一个只包含应用程序本身的镜像。
### 使用静态链接的二进制文件来构建镜像
按照这个思路,我们甚至可以舍弃容器内部的管理工具。例如,如果我们不需要在容器中进行调试,是不是可以不要 Bash 了?是不是可以不要 [GNU 套件][8]了?是不是可以不要 Linux 基础文件系统了?如果你使用的编译型语言支持[静态链接库][9],将应用程序所需要的所有库和函数都编译成二进制文件,那么上面那些“累赘”可以统统省去。
这种做法在 [Golang][10] 社区中已经十分常见,下面我们使用由 Go 语言编写的应用程序进行展示:
以下这个 Dockerfile 基于 `golang:1.8` 构建一个 Hello World 应用程序镜像:
```
FROM golang:1.8
ENV GOOS=linux
ENV appdir=/go/src/gohelloworld
COPY ./ /go/src/goHelloWorld
WORKDIR /go/src/goHelloWorld
RUN go get
RUN go build -o /goHelloWorld -a
CMD ["/goHelloWorld"]
```
构建出来的镜像中包含了二进制文件、源代码以及基础镜像层,一共 716MB。但对于应用程序运行唯一必要的只有编译后的二进制文件其余内容在镜像中都是多余的。
如果在编译的时候通过指定参数 `CGO_ENABLED=0` 来禁用 `cgo`,就可以在编译二进制文件的时候忽略某些 C 语言库:
```
GOOS=linux CGO_ENABLED=0 go build -a goHelloWorld.go
```
编译出来的二进制文件可以基于空白镜像来构建应用程序镜像:
```
FROM scratch
COPY goHelloWorld /
CMD ["/goHelloWorld"]
```
来看一下两次构建的镜像对比:
```
[ chris@krang ] $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
goHello scratch a5881650d6e9 13 seconds ago 1.55 MB
goHello builder 980290a100db 14 seconds ago 716 MB
```
从镜像体积来说简直是天差地别了。基于 `golang:1.8` 镜像构建出来的应用程序镜像(带有 builder 标签)体积是基于空白镜像构建出来的应用程序镜像的 460 倍!后者的整个镜像大小只有 1.55MB,也就是说,有 713MB 的数据都是非必要的。
正如上面提到的,这种缩减镜像体积的方式在 Golang 社区非常流行,因此不乏这方面的文章。[Kelsey Hightower][11] 有一篇[文章][12]专门介绍了如何处理这些库的依赖关系。
### 压缩镜像层
除了前面几节中讲到的将多个命令连接成一个命令的技巧,还可以对镜像层进行压缩。镜像层压缩的实质是删除掉镜像构建过程中的所有中间层,然后使用镜像的当前状态导出单个镜像层。这样可以进一步将镜像缩小到更小的体积。
在 Docker 1.13 之前,压缩镜像层的的过程可能比较麻烦,需要用到 `docker-squash` 之类的工具来导出容器的内容并重新导入成一个单层的镜像。但 Docker 在 Docker 1.13 中引入了 `--squash` 参数,可以在构建过程中实现同样的功能:
```
FROM fedora:28
LABEL maintainer Chris Collins <collins.christopher@gmail.com>
RUN dnf install -y nginx
RUN dnf clean all
RUN rm -rf /var/cache/yum
[chris@krang] $ docker build -t squash -f Dockerfile-squash --squash .
[chris@krang] $ docker images --format "{{.Repository}}: {{.Size}}" | head -n 1
squash: 271 MB
```
通过这种方式使用 Dockerfile 构建出来的镜像有 271MB 大小,和上面连接多条命令的方案构建出来的镜像体积一样,因此这个方案也是有效的,但也有一个潜在的问题,而且是另一种问题。
### Going too far: Too squashed, too small, too specialized
容器镜像之间可以共享镜像层。一个镜像层本身会带有一定的体积,但只要存在于镜像仓库中,就可以被其它容器镜像复用。一个容器镜像的大小是基础镜像层加上每层的差异内容,因此,如果有数千个基于同一个基础镜像的容器镜像,其体积之和也有可能只比一个基础镜像大不了多少。
因此,如果过度使用压缩图像层的方案,将不同镜像压缩成单个镜像层,各个容器镜像之间就没有可以共享的镜像层了,每个容器镜像都会占有单独的体积。如果你只需要维护少数几个容器镜像,这个问题可以忽略不计;但如果你要维护的容器镜像很多,从长远来看,就会耗费大量的存储空间。
回顾上面 Nginx 的例子。在这个镜像中,有 Fedora 操作系统和 Nginx 应用程序,没有安装缓存,并且已经被压缩。但我们一般不会使用一个原始的 Nginx而是会修改配置文件以及引入其它代码或应用程序来配合 Nginx 使用而要做到这些Dockerfile 就变得更加复杂了。
如果使用普通的镜像构建方式,构建出来的容器镜像就会带有 Fedora 操作系统的镜像层、一个安装了 Nginx 的镜像层、为 Nginx 作自定义配置的其它多个镜像层,而如果有其它容器镜像需要用到 Fedora 或者 Nginx就可以复用这个容器镜像的前两层。
```
[ App 1 Layer ( 5 MB) ] [ App 2 Layer (6 MB) ]
[ Nginx Layer ( 21 MB) ] ------------------^
[ Fedora Layer (249 MB) ]
```
如果使用压缩镜像层的构建方式Fedora 操作系统会和 Nginx 以及其它配置内容都被压缩到同一层里面,如果有其它容器镜像需要使用到 Fedora就必须重新引入 Fedora 基础镜像,这样每个容器镜像都会额外增加 249MB 的大小。
```
[ Fedora + Nginx + App 1 (275 MB)] [ Fedora + Nginx + App 2 (276 MB) ]
```
当你构建了大量在功能上趋于分化的的小型容器镜像是,这个问题就会暴露出来了。
就像生活中的每一件事一样,关键是要做到适度。根据镜像层的实现原理,如果一个容器镜像变得越小、越专用化,就越难和其它容器镜像共享基础的镜像层,这样反而带来不好的效果。
对于仅在基础镜像上做微小变动构建出来的多个容器镜像,可以考虑共享基础镜像层。如上所述,一个镜像层本身会带有一定的体积,但只要存在于镜像仓库中,就可以被其它容器镜像复用。
```
[ specific app ] [ specific app 2 ]
[ customizations ]--------------^
[ base layer ]
```
一个容器镜像变得越小、越专用化,就越难和其它容器镜像共享基础的镜像层,最终会不必要地占用越来越多的存储空间。
```
[ specific app 1 ] [ specific app 2 ] [ specific app 3 ]
```
### 总结
减少处理容器镜像时所需的存储空间和带宽的方法有很多,其中最直接的方法就是减小容器镜像本身的大小。在使用容器的过程中,要经常留意容器镜像是否体积过大,根据不同的情况采用上述提到的清理缓存、压缩层、在空白镜像中放入二进制文件等不同的方法,将容器镜像的体积缩减到一个真实的规格。
--------------------------------------------------------------------------------
via: https://opensource.com/article/18/7/building-container-images
作者:[Chris Collins][a]
选题:[lujun9972](https://github.com/lujun9972)
译者:[HankChow](https://github.com/HankChow)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://opensource.com/users/clcollins
[1]:https://www.docker.com/
[2]:https://www.nginx.com/
[3]:https://rubyonrails.org/
[4]:https://github.com/rbenv/rbenv
[5]:https://opensource.com/article/18/6/getting-started-buildah
[6]:https://twitter.com/TSweeneyRedHat
[7]:https://opensource.com/article/18/5/containers-buildah
[8]:https://www.gnu.org/software/coreutils/coreutils.html
[9]:https://en.wikipedia.org/wiki/Static_library
[10]:https://golang.org/
[11]:https://twitter.com/kelseyhightower
[12]:https://medium.com/@kelseyhightower/optimizing-docker-images-for-static-binaries-b5696e26eb07