Create SQL Server docker image with restored backup database using purely a Dockerfile-CodePudding

The following docker file creates a custom SQL server image with a database restored from a backup (rmsdev.bak).

FROM mcr.microsoft.com/mssql/server:2019-latest
ENV MSSQL_PID=Developer
ENV SA_PASSWORD=Password1?
ENV ACCEPT_EULA=Y

USER mssql

COPY rmsdev.bak /var/opt/mssql/backup/

# Launch SQL Server, confirm startup is complete, restore the database, then terminate SQL Server.
RUN ( /opt/mssql/bin/sqlservr & ) | grep -q "Service Broker manager has started" \
    && /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P $SA_PASSWORD -Q 'RESTORE DATABASE rmsdev FROM DISK = "/var/opt/mssql/backup/rmsdev.bak" WITH MOVE "rmsdev" to "/var/opt/mssql/data/rmsdev.mdf", MOVE "rmsdev_Log" to "/var/opt/mssql/data/rmsdev_log.ldf", NOUNLOAD, STATS = 5' \
    && pkill sqlservr

CMD ["/opt/mssql/bin/sqlservr"]

The issue is that, once the restore is complete, the backup file is not required anymore and I would like to remove it from the image.

Unfortunately, due to how docker images are formed (layers) I cannot simply 'rm' the file as I would like to.

Multistage Dockerfile is not easily applicable in this case as in a build scenario. Another way would be to run the container, restore the backup and then commit a new image, but what I am looking to do is to use only docker build with the proper Dockerfile.

Does anyone know a way?

CodePudding user response：

If you know where the data directory is in the image, and the image does not declare that directory as a VOLUME, then you can use a multi-stage build for this. The first stage would set up the data directory as you show. The second stage would copy the populated data directory from the first stage but not the backup file. This trick might depend on the two stages running identical builds of the underlying software.

For SQL Server, the Docker Hub page and GitHub repo are both tricky to find, and surprisingly neither talks to the issue of data storage (as @HansKillian notes in a comment, you would almost always want to store the database data in some sort of volume). The GitHub repo does include a Helm chart built around a Kubernetes StatefulSet and from that we can discover that a data directory would be mounted on /var/opt/mssql.

So I might write a multi-stage build like so:

# Put common setup steps in an initial stage
FROM mcr.microsoft.com/mssql/server:2019-latest AS setup
ENV MSSQL_PID=Developer
ENV SA_PASSWORD=Password1?  # (weak password, easily extracted with `docker inspect`)
ENV ACCEPT_EULA=Y           # (legally probably the end user needs to accept this not the image builder)

# Have a stage specifically to populate the data directory
FROM setup AS data
# (copy-and-pasted from the question)
USER mssql
COPY rmsdev.bak /  # not under /var/opt/mssql
RUN ( /opt/mssql/bin/sqlservr & ) | grep -q "Service Broker manager has started" \
    && /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P $SA_PASSWORD -Q 'RESTORE DATABASE rmsdev FROM DISK = "/rmsdev.bak" WITH MOVE "rmsdev" to "/var/opt/mssql/data/rmsdev.mdf", MOVE "rmsdev_Log" to "/var/opt/mssql/data/rmsdev_log.ldf", NOUNLOAD, STATS = 5' \
    && pkill sqlservr

# Final stage that actually will actually be run.
FROM setup
# Copy the prepopulated data tree, but not the backup file
COPY --from=data /var/opt/mssql /var/opt/mssql
# Use the default USER, CMD, etc. from the base SQL Server image

The standard Docker Hub open-source database images like mysql and postgres generally declare a VOLUME in their Dockerfile for the database data, which forces the data to be stored in a volume. The important thing this means is that you can't set up data in the image like this; you have to populate the data externally, and then copy the data tree outside of the Docker image system.