I'm trying to uncompress a file and delete the original, compressed, archive in my Dockerfile
image build instructions. I need to do this because the file in question is larger than the 2GB
limit set by Github on large file sizes (see here). The solution I'm pursuing is to compress the file (bringing it under the 2GB
limit), and then decompress when I build the application. I know it's bad practice to build large images and plan to integrate a external database into the project but don't have time now to do this.
I've tried various options, but have been unsuccessful.
- Compress the file in
.zip
format and useapt-get
to installunzip
and then decompress the file withunzip
:
FROM python:3.8-slim
#install unzip
RUN apt-get update && apt-get install unzip
WORKDIR /app
COPY /data/databases/file.db.zip /data/databases
RUN unzip /data/databases/file.db.zip && rm -f /data/databases/file.db.zip
COPY ./ ./
This fails with unzip: cannot find or open /data/databases/file.db.zip, /data/databases/file.db.zip.zip or /data/databases/file.db.zip.ZIP.
I don't understand this, as I thought COPY
added files to the image.
- Following this advice, I compressed the large file with
gzip
and tried to use theDocker
nativeADD
command to uncompress it, i.e.:
FROM python:3.8-slim
WORKDIR /app
ADD /data/databases/file.db.gz /data/databases/file.db
COPY ./ ./
While this compiles without error, it does not decompress the file, which I can see using docker exec -t -i clean-dash /bin/bash
to explore the image directory structure. Since the large file is a gzip
file, my understanding is ADD
should decompress it, i.e. from the docs.
How can I solve these requirements?
CodePudding user response:
ADD
only decompresses local tar files, not necessarily compressed single files. It may work to package the contents in a tar file, even if it only contains a single file:
ADD ./data/databases/file.tar.gz /data/databases/
(cd data/databases && tar cvzf file.tar.gz file.db)
docker build .
If you're using the first approach, you must use a multi-stage build here. The problem is that each RUN
command generates a new image layer, so the resulting image is always the previous layer plus whatever changes the RUN
command makes; RUN rm a-large-file
will actually result in an image that's slightly larger than the image that contains the large file.
The BusyBox tool set includes, among other things, an implementation of unzip(1), so you should be able to split this up into a stage that just unpacks the large file and then a stage that copies the result in:
FROM busybox AS unpack
WORKDIR /unpack
COPY data/databases/file.db.zip /
RUN unzip /file.db.zip
FROM python:3.8-slim
COPY --from=unpack /unpack/ /data/databases/
In terms of the Docker image any of these approaches will create a single very large layer. In the past I've run into operational problems with single layers larger than about 1 GiB, things like docker push
hanging up halfway through. With the multi-stage build approach, if you have multiple files you're trying to copy, you could have several COPY
steps that break the batch of files into multiple layers. (But if it's a single SQLite file, there's nothing you can really do.)
CodePudding user response:
Based on @David Maze's answer, the following worked, which I post here for completeness.
#unpacks zipped database
FROM busybox AS unpack
WORKDIR /unpack
COPY data/databases/file.db.zip /
RUN unzip /file.db.zip
FROM python:3.8-slim
COPY --from=unpack /unpack/file.db /
WORKDIR /app
COPY ./ ./
#move the unpacked db and delete the original
RUN mv /file.db ./data/databases && rm -f ./data/databases/file.db.zip