Home > OS >  Where are files stored in docker daemon?
Where are files stored in docker daemon?

Time:11-23

I tried to add a file via ADD command and then deleted it. But the size of docker images also shows that it includes that files! If I put * in .dockerignore, it will not work with ADD.

Dockerfile:

from ubuntu:20.04

ADD myfile /tmp

RUN rm /tmp/*

Then I built it by $ docker build -t testwf .

At the first stage it shows the below:

Sending build context to Docker daemon  34.21MB

The size of myfile file is around 33MB

$ docker images
REPOSITORY                       TAG       IMAGE ID       CREATED          SIZE
testwf                           latest    96543168ab34   16 minutes ago   107MB
ubuntu                           20.04     ba6acccedd29   5 weeks ago      72.8MB

Actually, I was supposed to get an image with 72.8MB the same size with ubuntu not 107MB which is roughly equal to 72.8MB plus 33MB! In other words, If I didn't have that file with ADD command, was there any way to access the file in the container as it was copied to Docker daemon?

update

As HansKilian mentioned in the comments that file went in one of the layers where the final image is constructed on top of that. Is there any way to get rid of that layer in order to decrease the size of the final image?

$ docker history testwf:latest                                                                  
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
2af2733972ab   4 seconds ago   /bin/sh -c rm /tmp/*                            0B
40d13da4e0cc   4 seconds ago   /bin/sh -c #(nop) ADD file:0ddf694d27b108b4a…   34.2MB
ba6acccedd29   5 weeks ago     /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>      5 weeks ago     /bin/sh -c #(nop) ADD file:5d68d27cc15a80653…   72.8MB

CodePudding user response:

There are several ways to "merge" intermediate layers in Docker:

More details:

In principle, each command in Dockerfile add a new "layer" containing the file system after command execution in the final image, what Docker helps here is that you may save each layer by only its diff from the previous layer, so we don't waste disk space for the same files.
For example, if we execute an add then an remove commands on top of some layer 0, the add command create layer 1 including only the added file. The remove commands create layer 2 marking the file as removed. Since each layer only compares its diff with the previous layer, Docker don't know that layer 2 is identical to layer 0 during build. If we repeat the add/delete commands, every time we add, we create an extra layer with size equal to the file. As a result, we may build mulitple images with identical (final) content but varied size. For example, we may create a 32MB file and add/delete it twice to the same image like:

from ubuntu:latest

ADD big_file .
RUN rm big_file
ADD big_file .
RUN rm big_file

Building it with docker build . -t big_file:latest gives a image with size equal to <BASE_SIZE> 32 MB * 2:

REPOSITORY                          TAG        IMAGE ID       CREATED         SIZE
big_file                            latest     ddd32b7a8519   2 minutes ago   140MB
ubuntu                              latest     ba6acccedd29   5 weeks ago     72.8MB

We can check layers within big_file by docker history <IMAGE> and get

IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
ddd32b7a8519   4 minutes ago   /bin/sh -c rm big_file                          0B        
c20573523c30   4 minutes ago   /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b…   33.6MB    
80ae0642e3ad   4 minutes ago   /bin/sh -c rm big_file                          0B        
0538ebbf489c   4 minutes ago   /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b…   33.6MB    
ba6acccedd29   5 weeks ago     /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      5 weeks ago     /bin/sh -c #(nop) ADD file:5d68d27cc15a80653…   72.8MB

So what do above three methods do?

  • multi-stage build

It throw away all layers in the previous stage and copy only specified files to the next stage. For example

from ubuntu:latest

ADD big_file .
RUN rm big_file
ADD big_file .
RUN rm big_file
ADD big_file .

from ubuntu:latest
COPY --from=0 big_file .

Building it gives two image, one for stage-0, another for stage-1.

REPOSITORY                          TAG        IMAGE ID       CREATED          SIZE
<none>                              <none>     6d844f18d92e   5 seconds ago    173MB
big_file                            latest     4b1025db1335   33 seconds ago   106MB
ubuntu                              latest     ba6acccedd29   5 weeks ago      72.8MB

Check the stage-1 image, it's clear that layers in stage-0 image is not copied. It contains only one extra layer created by COPY --from=0 big_file . command.

IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
4b1025db1335   47 seconds ago   /bin/sh -c #(nop) COPY file:937071a2cba4a5d8…   33.6MB    
ba6acccedd29   5 weeks ago      /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      5 weeks ago      /bin/sh -c #(nop) ADD file:5d68d27cc15a80653…   72.8MB   

It suits in situation you are clear what you need from the stage-0 image. A good example is you may compile in stage-0 and copy only the binary to stage-1. One common mistake, however, is that one may forget to copy dynamic libraries required by the binary which are missing in the stage-1 image as these two images are two different images with respective base image and layers.

  • --squash

It's similar as squash in git. It loads and applies diff in each layer to create a new layer and use only the new layer in the built image.

Building using docker build . -t big_file:latest --squash gives three image s

REPOSITORY                          TAG        IMAGE ID       CREATED         SIZE
big_file                            latest     6903fba8cef3   2 seconds ago   72.8MB
<none>                              <none>     28ed65140111   3 seconds ago   140MB
ubuntu                              latest     ba6acccedd29   5 weeks ago     72.8MB

28ed65140111 is the image before squash, check layers in big_file

IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
6903fba8cef3   10 seconds ago                                                   0B        merge sha256:28ed65140111012d7604df5123b9be16ab4bfc62dd799259001b5d609ceb8e18 to sha256:ba6acccedd2923aee4c2acc6a23780b14ed4b8a5fa4e14e252a23b846df9b6c1
<missing>      11 seconds ago   /bin/sh -c rm big_file                          0B        
<missing>      12 seconds ago   /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b…   0B        
<missing>      13 seconds ago   /bin/sh -c rm big_file                          0B        
<missing>      14 seconds ago   /bin/sh -c #(nop) ADD file:937071a2cba4a5d8b…   0B        
<missing>      5 weeks ago      /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      5 weeks ago      /bin/sh -c #(nop) ADD file:5d68d27cc15a80653…   72.8MB 

After load all diffs, there is nothing different from the base image, so the merge layer 6903fba8cef3 is empty. But squash build is currently an experimental feature.

  • export/import

Notice that export works for a container rather than a image, it only dumps the current state of the container's file system, and ignores layer informations in the image. If we dump one container running our big_file image and then re-import it using docker export <CONTAINER_ID> > big_file.tar && docker import - big_file:load < big_file.tar, we get an "empty" image looks like:

IMAGE          CREATED          CREATED BY   SIZE      COMMENT
06f4d01022e7   13 seconds ago                72.8MB    Imported from -

Now we can't know how the image is built since layers are not dumped.

Which one is better really depends..., but the concept of layer in Docker is very important. Docker never forgets anything unless you drop or merge the layer somehow.

CodePudding user response:

What you are looking for is a Docker multistage build, where you first use an image for your build, with all your dependencies, and then build a new image with only the relevant artifacts. That way you don't have to delete the files you don't need, so much as simply not include them.

https://docs.docker.com/develop/develop-images/multistage-build/

CodePudding user response:

You can have a multistage build:

FROM alpine:latest  
ADD myfile /tmp

from ubuntu:20.04
COPY --from=0 /tmp/myfile ./
RUN rm /tmp/*
  • Related