I've been recently refactoring a Dockerfile
and decided to try ADD
over RUN curl
to make the file cleaner. To my surprise, this resulted in quite a size difference:
$ docker images | grep test
test curl 3aa809928665 7 minutes ago 746MB
test add da152355bb4d 3 minutes ago 941MB
Even more surprisingly, I tried a few Dockerfile
s that do nothing except ADD
ing or curl
ing things, and their sizes are identical. I also tried with and without buildkit, the result is the same (although without buildkit images are slightly smaller).
Here's the actual Dockerfile
FROM ubuntu:22.04
ENV AWSCLI_VERSION "2.7.31"
ENV HELM_VERSION "3.9.4"
ENV OC_VERSION "4.11.5"
ENV VAULT_VERSION "1.11.3"
ENV YQ_VERSION "4.27.5"
ENV YQ_BINARY "yq_linux_amd64"
ENV DEBIAN_FRONTEND "noninteractive"
ADD "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip" /extras/awscli.zip
ADD "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip.sig" /extras/awscli.sig
ADD "https://get.helm.sh/helm-v${HELM_VERSION}-linux-amd64.tar.gz" /extras/helm.tgz
ADD "https://github.com/mikefarah/yq/releases/download/v${YQ_VERSION}/${YQ_BINARY}.tar.gz" /extras/yq.tgz
ADD "https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${OC_VERSION}/openshift-client-linux.tar.gz" /extras/oc.tgz
ADD "https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip" /extras/vault.zip
COPY aws-cli.pub /extras/aws-cli.pub
RUN cd /extras && \
apt update && \
apt install -y --no-install-recommends \
ca-certificates \
curl \
gawk \
gettext \
git \
gnupg2 \
jq \
openssh-client \
unzip && \
gpg --import /extras/aws-cli.pub && \
# curl -L "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip" -o /extras/awscli.zip && \
# curl -L "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-${AWSCLI_VERSION}.zip.sig" -o /extras/awscli.sig && \
gpg --verify awscli.sig awscli.zip && \
unzip -qq awscli.zip && \
/extras/aws/install --update && \
rm -rf /extras/aws* && \
# curl -L "https://get.helm.sh/helm-v${HELM_VERSION}-linux-amd64.tar.gz" -o /extras/helm.tgz && \
# curl -L "https://github.com/mikefarah/yq/releases/download/v${YQ_VERSION}/${YQ_BINARY}.tar.gz" -o /extras/yq.tgz && \
# curl -L "https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/${OC_VERSION}/openshift-client-linux.tar.gz" -o /extras/oc.tgz && \
# curl -L "https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip" -o /extras/vault.zip && \
find . -type f -name '*.tgz' -exec tar -xzf {} \; && \
find . -type f -name '*.zip' -exec unzip -qq {} \; && \
find . -type f -perm /101 -exec mv {} /usr/local/bin/ \; && \
mv /usr/local/bin/${YQ_BINARY} /usr/local/bin/yq && \
find /extras/ -mindepth 1 -delete && \
apt clean && rm -rf /var/lib/apt/lists/*
ENTRYPOINT []
. I don't understand why this happens with this particular Dockerfile
, because essentially I'm doing exactly the same things.
Any ideas?
CodePudding user response:
You notice this, because ADDed files do not disappear from older image layers even if you remove them later. Consider the following dockerfiles:
# a
FROM alpine:latest
RUN apk add --no-cache curl
ADD https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz Python.tar.xz
RUN rm Python.tar.xz
# b
FROM alpine:latest
RUN apk add --no-cache curl
RUN curl -o Python.tar.xz https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz
RUN rm Python.tar.xz
# c
FROM alpine:latest
RUN apk add --no-cache curl
RUN curl -o Python.tar.xz https://www.python.org/ftp/python/3.10.7/Python-3.10.7.tar.xz && \
rm Python.tar.xz
Building each of them in the same context, I got the following results:
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> cc79832a5ffa 9 seconds ago 27.3MB
<none> <none> 87ea16448764 13 seconds ago 7.68MB
<none> <none> 7f794f03b960 18 seconds ago 27.3MB
alpine latest 9c6f07244728 5 weeks ago 5.54MB
(guess which file yields different result)
If at some point you "finished" a layer with some files you don't need in final image - you wasted the space. So your single RUN command is the most efficient. To improve readability, you may try to adapt multi-stage build here, so that all curl/ADD, unzip/tar -x commands are isolated on build stage, and then you have only required binaries to copy from build stage to deploy stage. I'm not sure however that you'll gain much here.