Home > Net >  Why do docker containers rely on uploading (large) images rather than building from the spec files?
Why do docker containers rely on uploading (large) images rather than building from the spec files?

Time:02-03

Having needed several times in the last few days to upload a 1Gb image after some micro change, I can't help but wonder why there isnt a deploy path built into docker and related tech (e.g. k8s) to push just the application files (Dockerfile, docker-compose.yml and app related code) and have it build out the infrastructure from within the (live) docker host?

In other words, why do I have to upload an entire linux machine whenever I change my app code?

Isn't the whole point of Docker that the configs describe a purely deterministic infrastructure output? I can't even see why one would need to upload the whole container image unless they make changes to it manually, outside of Dockerfile, and then wish to upload that modified image. But that seems like bad practice at the very least...

Am I missing something or this just a peculiarity of the system?

CodePudding user response:

Good question.

Short answer:

Because storage is cheaper than processing power, building images "Live" might be complex, time-consuming and it might be unpredictable.

On your Kubernetes cluster, for example, you just want to pull "cached" layers of your image that you know that it works, and you just run it... In seconds instead of compiling binaries and downloading things (as you would specify in your Dockerfile).

About building images:

You don't have to build these images locally, you can use your CI/CD runners and run the docker build and docker push from the pipelines that run when you push your code to a git repository.

And also, if the image is too big you should look into ways of reducing its size by using multi-stage building, using lighter/minimal base images, using few layers (for example multiple RUN apt install can be grouped to one apt install command listing multiple packages), and also by using .dockerignore to not ship unnecessary files to your image. And last read more about caching in docker builds as it may reduce the size of the layers you might be pushing when making changes.


Long answer:

Think of the Dockerfile as the source code, and the Image as the final binary. I know it's a classic example.

But just consider how long it would take to build/compile the binary every time you want to use it (either by running it, or importing it as a library in a different piece of software). Then consider how indeterministic it would download the dependencies of that software, or compile them on different machines every time you run them.

You can take for example Node.js's Dockerfile: https://github.com/nodejs/docker-node/blob/main/16/alpine3.16/Dockerfile

Which is based on Alpine: https://github.com/alpinelinux/docker-alpine

You don't want your application to perform all operations specified in these files (and their scripts) on runtime before actually starting your applications as it might be unpredictable, time-consuming, and more complex than it should be (for example you'd require firewall exceptions for an Egress traffic to the internet from the cluster to download some dependencies which you don't know if they would be available).

You would instead just ship an image based on the base image you tested and built your code to run on. That image would be built and sent to the registry then k8s will run it as a black box, which might be predictable and deterministic.

Then about your point of how annoying it is to push huge docker images every time:

You might cut that size down by following some best practices and well designing your Dockerfile, for example:

  • Reduce your layers, for example, pass multiple arguments whenever it's possible to commands, instead of re-running them multiple times.
  • Use multi-stage building, so you will only push the final image, not the stages you needed to build to compile and configure your application.
  • Avoid injecting data into your images, you can pass it later on-runtime to the containers.
  • Order your layers, so you would not have to re-build untouched layers when making changes.
  • Don't include unnecessary files, and use .dockerignore.

And last but not least:

You don't have to push images from your machine, you can do it with CI/CD runners (for example build-push Github action), or you can use your cloud provider's "Cloud Build" products (like Cloud Build for GCP and AWS CodeBuild)

  • Related