Is there any cache advantage to using ADD <url> vs RUN wget/curl <url> in a Dockerfile-CodePudding

Is there any advantage to layer cache invalidation by using ADD instead of RUN?

Background

I frequently see Dockerfiles that install wget or curl just to RUN wget … or RUN curl … to install some dependency that cannot be found in package management.

I suspect these could be converted to simple ADD <url> <dest> lines, and that would at least obviate the need for adding curl or wget to the image.

Further, it seems like the docker daemon could rely on HTTP cache invalidation to inform its own layer cache invalidation. At a minimum (e.g. in the absence of HTTP cache headers), it could GET the resource, hash it, and calculate invalidation the same way it does for local files.

NOTE: I am familiar with the usage of Add vs RUN …, but I am looking for a strong reason to choose one over the other. In particular, I want to know if ADD <url> can behave any more intelligently with regard to layer cache invalidation.

CodePudding user response：

Certainly.

The RUN instruction will not invalidate the cache unless its text changes. So if the remote file is updated, you won't get it. Docker will use the cached layer.

The ADD instruction will always download the file and the cache will be invalidated if the checksum of the file no longer matches.

I would recommend using ADD instead of RUN wget ... or RUN curl .... I imagine people use the latter as its more familiar, but the ADD instruction is quite powerful. It can untar files and set ownership. It's also considered best practice to avoid downloading any packages that are not necessary for your process to run (though there are multiple ways to accomplish this, like using multi-stage builds).

Docs on cache invalidation:

https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache