Why does docker bother of context if we do not copy all-CodePudding

In various sites of Docker official web, it warns about the folder that is sent to docker daemon (they call as context) to build new image with docker build. For example, from understand-build-context

Inadvertently including files that are not necessary for building an image results in a larger build context and larger image size. This can increase the time to build the image, time to pull and push it, and the container runtime size. To see how big your build context is, look for a message like this when building your Dockerfile:

Sending build context to Docker daemon 187.8MB

I do not understand why the context is so important if we do not use all its content.

Let say that my build context is a 1GB folder, but in Dockerfile I have only one COPY command of a file of 1KB. Then why do we bother about the rest? How could the rest affect the size of my image?

Similarly, why do we have .dockerignore? If I do not use them in Dockerfile, are not they ignored at all? If not, then for what are they used?

CodePudding user response：

Let say that my build context is a 1GB folder, but in Dockerfile...

The Dockerfile is normally transferred as part of the build context. Perhaps the easiest place to see this is in the "build an image" Docker HTTP API: the dockerfile parameter is explicitly a path within the build context, which is expected to be transferred in the HTTP body as a tar file. In that low-level API there's no way to pass the Dockerfile outside of that build-context tar-file HTTP body.

So first you send the build context to the Docker daemon, then the daemon unpacks it, and then it reads the Dockerfile and sees

I have only one COPY command of a file of 1KB.

so only that one file is copied into the resulting image; the rest of the context is just ignored.

Then why do we bother about the rest? How could the rest affect the size of my image? Similarly, why do we have .dockerignore?

Sending the build context is surprisingly slow. Even if you're not using remote Docker, and working directly on a native-Linux host, it can take multiple seconds to send that 1 GB tar-file build context over the Unix socket. So smaller build contexts can result in faster builds, and .dockerignore is a convenient way to cause things you're not going to use to be omitted from the build context.

It is very common to copy the entire build context into an image, though, and in this case it's important to control what goes in there. Let's consider a typical Node application. In day-to-day development I might just use Node, so I'll have a package.json file and a src subdirectory, but Node installs all of its dependencies in a node_modules subdirectory as well. A typical Node Dockerfile will look something like

FROM node:lts
WORKDIR /app
# Copy and install dependencies
COPY package*.json ./
RUN npm ci
# Copy and build the rest of the application
COPY ./ ./ # <-- IMPORTANT
RUN npm run build
# Explain how to run the container
EXPOSE 3000
CMD ["node", "./build/index.js"]

The RUN npm ci line recreates the node_modules directory inside the image. In the next line I copy the entire build context – my src directory, webpack.js configuration, .typescript configuration, static assets, the whole works - into the image, with enough parts and local files that I'd prefer to not list them out individually.

In that context it's important that COPY ./ ./ not include the host's node_modules directory. The host might be a different OS, or a different C library version, or any of several other things that might cause incompatibilities. That's where putting it in .dockerignore lets me say "copy everything, except this".

Your question hints at a very carefully curated build-context directory. That's a possibility too; in particular it's something that made sense with a compiled language, on a native-Linux host, before Docker multi-stage builds existed. You could consider writing something like a Makefile that copied specific files from your source tree into a dedicated docker directory, and then used that directory as the build context. Then you'd know exactly what was in the build context and therefore exactly what was going into the image. With modern Docker and multi-stage builds, I feel like this setup is a little unusual though.

CodePudding user response：

The documentation was written before buildkit became standard in docker, but it's still a good practice for older build tooling. The reason for this in the classic builder is that docker is a client/server based app. To run a build, the client sends over the entire context, Dockerfile, and all the parameters for the server to build, and the server runs that build, pulling parts out of the context that the Dockerfile requests. As much as it looks like everything is happening locally, and often is, the server could be a remote host without direct access to your filesystem, and the build process is a JSON REST API that sends the request and then monitors for the build to complete.

Buildkit, however, changes this. Both the server and the client communicate with each other, and the server has a cache of not only the previous builds, but of the previous build contexts. So when a file changes in the context between builds, it can perform the equivalent of an rsync to send just that one file, and only when the server requests it from the client.

There is still a need for a .dockerignore since even with buildkit, you often want to exclude files within the build that would otherwise be copied in a wildcard match. For example, if you have the step:

COPY . /src

Then even with the buildkit caching, you'll include every file in the directory, even if a number of those files aren't needed to build your app (like the .git folder, the Dockerfile itself, the README, LICENSE, etc). That not only bloats your image and makes your builds slower, but it risks causing a cache miss when the resulting image would normally be unchanged.

Some will make the .dockerignore look similar to their .gitignore with some added files that don't affect the build. I often do the reverse, excluding everything, and then reincluding only the files I need with the ! prefix. E.g. the following would include only the Makefile, src, and static folders:

*
!Makefile
!src/
!static/

If you do that, make sure you remember to update it when adding new files or directories to your builds.