Docker not getting cached result from previous Gitlab CI/CD stage-CodePudding

I have a Docker Image i want to build for AMD64 and ARM64

The build process is made through Gitlab CI/CD using my own runners. This is because I can't emulate ARM build on AMD and viceversa (also it's painfully slow, so this is better option)

I have 2 machines, one with ARM64 and other with AMD64 both with Gitlab runners

The approach I followed is having 2 stages:

One for building each image, pushing it to a registry cache
Second stage gets the cache from the previous stage and pushes both architectures to the final registry tags

My .gitlab-ci.yml file looks like this

image: docker:20

variables:
    GROUP_NAME: "group"
    PROJECT_NAME: "project"
    BRANCH_NAME: "main"

    LATEST_NAME: "$PROJECT_NAME:latest"
    REGISTRY_LATEST_NAME: "$CI_REGISTRY/$GROUP_NAME/$PROJECT_NAME/$LATEST_NAME"
    REGISTRY_CACHE_NAME_PREFIX: "$CI_REGISTRY/$GROUP_NAME/$PROJECT_NAME/$PROJECT_NAME:cache"
    CACHE_AMD64: "$REGISTRY_CACHE_NAME_PREFIX-amd64"
    CACHE_ARM64: "$REGISTRY_CACHE_NAME_PREFIX-arm64"

    BACKEND_URL: "$BACKEND_URL" # This is an environment variable required for my specific build

services:
    - docker:20-dind

before_script:
  - docker context create builder-context
  - docker buildx create --name builderx --driver docker-container --bootstrap --use builder-context
  - docker login $CI_REGISTRY -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD

stages:
  - build
  - push

build_amd64:
  stage: build
  script:
    - docker buildx build --build-arg BACKEND_URL="${BACKEND_URL}" --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from="$CACHE_AMD64" --tag "$CACHE_AMD64" --push --platform=linux/amd64 .
  tags:
    - amd64

build_arm64:
  stage: build
  script:
    - docker buildx build --build-arg BACKEND_URL="${BACKEND_URL}" --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from="$CACHE_ARM64" --tag "$CACHE_ARM64" --push --platform=linux/arm64 .
  tags:
    - arm64

push:
  stage: push
  script:
    - docker buildx build --build-arg BACKEND_URL="${BACKEND_URL}" --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from="$CACHE_AMD64" --cache-from="$CACHE_ARM64" --tag "$REGISTRY_LATEST_NAME" --push --platform=linux/amd64,linux/arm64 .

The problem is that on the push stage, instead of getting everything from the cache, it starts building at some points, where, in theory, it shouldnt

My theory is that Docker is not using the cache because the layers' hashes are different on each stage

My Dockerfile is as follows, after a lot of testing (it may have some dumb stuff)

# NOTE: I split this in 3 stages trying to make it as deterministic as possible, but might be dumb

### Dependencies Installer Stage ###
FROM node:18-bullseye as installer

WORKDIR /app/admin
COPY yarn.lock /app/admin/
COPY package.json /app/admin/
RUN yarn global add gatsby-cli # <--- Push stage fails here, depending on runner chosen and architecture (amd runner and arm build fails for example)
RUN yarn install --silent --frozen-lockfile
### --- ###

### Builder stage ###
FROM node:18-bullseye as builder 

WORKDIR /app/admin
COPY . .
RUN rm -rf node_modules

COPY --from=installer /app/admin /app/admin
COPY --from=installer /usr/local/bin/ /usr/local/bin/
COPY --from=installer /usr/local/share/.config/yarn/global /usr/local/share/.config/yarn/global

# This is required because static site generated and served in Nginx can't read environment variables
# once they are generated
ARG BACKEND_URL
ENV GATSBY_BACKEND_URL $BACKEND_URL
RUN gatsby build
### --- ###

### Serve stage ###
FROM nginx:stable-alpine
EXPOSE 80 
COPY ./nginx/nginx.conf /etc/nginx/conf.d/default.conf 
COPY --from=builder /app/admin/public /usr/share/nginx/html
ENTRYPOINT ["nginx", "-g", "daemon off;"]
### --- ###

How can I either fix this/debug it/force to use previous stage cache?

CodePudding user response：

I don't know if this is the correct fix for this, but what I ended up doing is inserting the gatsby-cli dependency into the package.json by running in my local yarn add gatsby-cli then instead of using the command globally call it from node_modules. I do this because I can pin it to a fixed version so I can reuse cache for that, and also I don't have to copy /usr/bin and stuff like that from the dependencies installer stage

My final Dockerfile

# Dependencies Installer Stage
FROM node:18-bullseye as installer

WORKDIR /app/admin
COPY yarn.lock /app/admin
COPY package.json /app/admin
RUN yarn install --silent --frozen-lockfile

# Builder stage
FROM node:18-bullseye as builder 

WORKDIR /app/admin
COPY --from=installer /app/admin /app/admin

# Build arg required for static generated site
ARG BACKEND_URL
ENV GATSBY_MEDUSA_BACKEND_URL $BACKEND_URL
COPY . .
RUN ./node_modules/.bin/gatsby build

# Serve stage
FROM nginx:stable-alpine

# Nginx.conf is the same as the default.conf already present in docker image with minor changes (see nginx.conf)
COPY ./nginx/nginx.conf /etc/nginx/conf.d/default.conf 
COPY --from=builder /app/admin/public /usr/share/nginx/html

I also added a .dockerignore because I didn't have it before which I think helps filtering some files when doing COPY . .

So this are the contents of the .dockerignore

node_modules
Dockerfile
netlify.toml
.gitlab-ci.yml
.git
.cache
.github
.storybook
public

I'll leave this as the answer until somebody can explain better the behaviour of the cache and explain better why this got fixed by doing this