I tried to add a file via ADD
command and then deleted it. But the size of docker images also shows that it includes that files! If I put *
in .dockerignore
, it will not work with ADD
.
Dockerfile:
from ubuntu:20.04
ADD myfile /tmp
RUN rm /tmp/*
Then I built it by
$ docker build -t testwf .
At the first stage it shows the below:
Sending build context to Docker daemon 34.21MB
The size of myfile
file is around 33MB
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
testwf latest 96543168ab34 16 minutes ago 107MB
ubuntu 20.04 ba6acccedd29 5 weeks ago 72.8MB
Actually, I was supposed to get an image with 72.8MB
the same size with ubuntu
not 107MB
which is roughly equal to 72.8MB
plus 33MB
! In other words, If I didn't have that file with ADD
command, was there any way to access the file in the container as it was copied to Docker daemon
?
update
As HansKilian mentioned in the comments that file went in one of the layers where the final image is constructed on top of that. Is there any way to get rid of that layer in order to decrease the size of the final image?
$ docker history testwf:latest
IMAGE CREATED CREATED BY SIZE COMMENT
2af2733972ab 4 seconds ago /bin/sh -c rm /tmp/* 0B
40d13da4e0cc 4 seconds ago /bin/sh -c #(nop) ADD file:0ddf694d27b108b4a… 34.2MB
ba6acccedd29 5 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:5d68d27cc15a80653… 72.8MB
CodePudding user response:
There are several ways to "merge" intermediate layers in Docker:
- Multi-stage build as mentioned in Romain Prévost's answer
--squash
command options indocker build
https://docs.docker.com/engine/reference/commandline/image_build/- export a container running the final image with
docker export
and then re-import it bydocker import
.
More details:
In principle, each command in Dockerfile add a new "layer" containing the file system after command execution in the final image, what Docker helps here is that you may save each layer by only its diff from the previous layer, so we don't waste disk space for the same files.
For example, if we execute an add
then an remove
commands on top of some layer 0, the add
command create layer 1 including only the added file. The remove
commands create layer 2 marking the file as removed. Since each layer only compares its diff with the previous layer, Docker don't know that layer 2 is identical to layer 0 during build. If we repeat the add/delete commands, every time we add, we create an extra layer with size equal to the file. As a result, we may build mulitple images with identical (final) content but varied size. For example, we may create a 32MB file and add/delete it twice to the same image like:
from ubuntu:latest
ADD big_file .
RUN rm big_file
ADD big_file .
RUN rm big_file
Building it with docker build . -t big_file:latest
gives a image with size equal to <BASE_SIZE> 32 MB * 2:
REPOSITORY TAG IMAGE ID CREATED SIZE
big_file latest ddd32b7a8519 2 minutes ago 140MB
ubuntu latest ba6acccedd29 5 weeks ago 72.8MB
We can check layers within big_file
by docker history <IMAGE>
and get
IMAGE CREATED CREATED BY SIZE COMMENT
ddd32b7a8519 4 minutes ago /bin/sh -c rm big_file 0B
c20573523c30 4 minutes ago /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b… 33.6MB
80ae0642e3ad 4 minutes ago /bin/sh -c rm big_file 0B
0538ebbf489c 4 minutes ago /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b… 33.6MB
ba6acccedd29 5 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:5d68d27cc15a80653… 72.8MB
So what do above three methods do?
- multi-stage build
It throw away all layers in the previous stage and copy only specified files to the next stage. For example
from ubuntu:latest
ADD big_file .
RUN rm big_file
ADD big_file .
RUN rm big_file
ADD big_file .
from ubuntu:latest
COPY --from=0 big_file .
Building it gives two image, one for stage-0, another for stage-1.
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 6d844f18d92e 5 seconds ago 173MB
big_file latest 4b1025db1335 33 seconds ago 106MB
ubuntu latest ba6acccedd29 5 weeks ago 72.8MB
Check the stage-1 image, it's clear that layers in stage-0 image is not copied. It contains only one extra layer created by COPY --from=0 big_file .
command.
IMAGE CREATED CREATED BY SIZE COMMENT
4b1025db1335 47 seconds ago /bin/sh -c #(nop) COPY file:937071a2cba4a5d8… 33.6MB
ba6acccedd29 5 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:5d68d27cc15a80653… 72.8MB
It suits in situation you are clear what you need from the stage-0 image. A good example is you may compile in stage-0 and copy only the binary to stage-1. One common mistake, however, is that one may forget to copy dynamic libraries required by the binary which are missing in the stage-1 image as these two images are two different images with respective base image and layers.
--squash
It's similar as squash
in git. It loads and applies diff in each layer to create a new layer and use only the new layer in the built image.
Building using docker build . -t big_file:latest --squash
gives three image s
REPOSITORY TAG IMAGE ID CREATED SIZE
big_file latest 6903fba8cef3 2 seconds ago 72.8MB
<none> <none> 28ed65140111 3 seconds ago 140MB
ubuntu latest ba6acccedd29 5 weeks ago 72.8MB
28ed65140111
is the image before squash, check layers in big_file
IMAGE CREATED CREATED BY SIZE COMMENT
6903fba8cef3 10 seconds ago 0B merge sha256:28ed65140111012d7604df5123b9be16ab4bfc62dd799259001b5d609ceb8e18 to sha256:ba6acccedd2923aee4c2acc6a23780b14ed4b8a5fa4e14e252a23b846df9b6c1
<missing> 11 seconds ago /bin/sh -c rm big_file 0B
<missing> 12 seconds ago /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b… 0B
<missing> 13 seconds ago /bin/sh -c rm big_file 0B
<missing> 14 seconds ago /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b… 0B
<missing> 5 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:5d68d27cc15a80653… 72.8MB
After load all diffs, there is nothing different from the base image, so the merge layer 6903fba8cef3
is empty. But squash
build is currently an experimental feature.
- export/import
Notice that export works for a container rather than a image, it only dumps the current state of the container's file system, and ignores layer informations in the image. If we dump one container running our big_file
image and then re-import it using docker export <CONTAINER_ID> > big_file.tar && docker import - big_file:load < big_file.tar
, we get an "empty" image looks like:
IMAGE CREATED CREATED BY SIZE COMMENT
06f4d01022e7 13 seconds ago 72.8MB Imported from -
Now we can't know how the image is built since layers are not dumped.
Which one is better really depends..., but the concept of layer in Docker is very important. Docker never forgets anything unless you drop or merge the layer somehow.
CodePudding user response:
What you are looking for is a Docker multistage build, where you first use an image for your build, with all your dependencies, and then build a new image with only the relevant artifacts. That way you don't have to delete the files you don't need, so much as simply not include them.
https://docs.docker.com/develop/develop-images/multistage-build/
CodePudding user response:
You can have a multistage build:
FROM alpine:latest
ADD myfile /tmp
from ubuntu:20.04
COPY --from=0 /tmp/myfile ./
RUN rm /tmp/*