Home > database >  Why is my Docker image larger with a multi-stage build compared to a one-stage build?
Why is my Docker image larger with a multi-stage build compared to a one-stage build?

Time:06-14

I get an image of 292 MB with a multi-stage build, compared to 235 MB with a one-stage build. Can anyone help me understand why?

Here are the Dockerfiles.

One-stage build:

# syntax=docker/dockerfile:1
FROM python:3.9.13-alpine3.16
WORKDIR /project
ENV FLASK_APP=run.py
ENV FLASK_RUN_HOST=0.0.0.0
RUN apk add --no-cache gcc musl-dev linux-headers
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
EXPOSE 5000
COPY . .
CMD ["flask", "run"]

Multi-stage build:

# syntax=docker/dockerfile:1
FROM python:3.9.13-alpine3.16 AS compile-image
RUN apk add --no-cache gcc musl-dev linux-headers
ENV VIRTUAL_ENV=/opt/venv
RUN python -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
COPY requirements.txt .
RUN pip install -r requirements.txt
WORKDIR /project
COPY . .
RUN pip install .

FROM python:3.9.13-alpine3.16
COPY --from=compile-image $VIRTUAL_ENV $VIRTUAL_ENV
ENV FLASK_APP=run.py
ENV FLASK_RUN_HOST=0.0.0.0
EXPOSE 5000
CMD ["flask", "run"]

Note: I used an adapted version of the method suggested in the following article: https://pythonspeed.com/articles/multi-stage-docker-python/.

CodePudding user response:

When you have a second FROM statement in a Dockerfile, everything up until that statement is no longer part of the image, including ENV statements.

So in your second part, VIRTUAL_ENV doesn't have a value. That leads to your COPY statement being COPY --from=compile-image with no paths. I tried looking at the docs and it is a valid statement, but the docs don't describe what happens when you do that. I tested it and it seems that it copies everything from the compile-image into your new image. That causes the image to double in size.

To fix it, you can replace the environment variable in the second part with the path you want, like this

# syntax=docker/dockerfile:1
FROM python:3.9.13-alpine3.16 AS compile-image
RUN apk add --no-cache gcc musl-dev linux-headers
ENV VIRTUAL_ENV=/opt/venv
RUN python -m venv $VIRTUAL_ENV
COPY requirements.txt .
RUN pip install -r requirements.txt
WORKDIR /project
COPY . .
RUN pip install .

FROM python:3.9.13-alpine3.16
COPY --from=compile-image /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
ENV FLASK_APP=run.py
ENV FLASK_RUN_HOST=0.0.0.0
EXPOSE 5000
CMD ["flask", "run"]

Since you probably want the PATH to have /opt/venv/bin in it in the final image, I've moved the setting of the PATH to the second stage. If it's in the first stage, it'll also be lost when you hit the second FROM statement.

I'm not a Python expert, but you might also need to move/copy the python -m venv statement to the second stage if that's needed at runtime.

  • Related