Home > database >  Building Pytorch form source fails using the provided Dockerfile
Building Pytorch form source fails using the provided Dockerfile

Time:09-27

I'm trying to build a docker image that I can use as a development environment for modifying Pytorch. There is a Dockerfile provided in the repo, and I'm trying the following:

  1. git clone --recursive https://github.com/pytorch/pytorch
  2. cd pytorch
  3. DOCKER_BUILDKIT=1 docker build -t pytorchtest .

But the docker build results in the following error:

...
#20 28.80 Performing C   SOURCE FILE Test HAS_WERROR_CAST_FUNCTION_TYPE failed with the following output:
#20 28.80 Change Dir: /opt/pytorch/build/CMakeFiles/CMakeTmp
#20 28.80
#20 28.80 Run Build Command(s):/usr/bin/make -f Makefile cmTC_09005/fast && /usr/bin/make  -f CMakeFiles/cmTC_09005.dir/build.make CMakeFiles/cmTC_09005.dir/build
#20 28.80 make[1]: Entering directory '/opt/pytorch/build/CMakeFiles/CMakeTmp'
#20 28.80 Building CXX object CMakeFiles/cmTC_09005.dir/src.cxx.o
#20 28.80 /usr/bin/c   -DHAS_WERROR_CAST_FUNCTION_TYPE  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format  -fPIE   -Werror=cast-function-type -o CMakeFiles/cmTC_09005.dir/src.cxx.o -c /opt/pytorch/build/CMakeFiles/CMakeTmp/src.cxx
#20 28.80 cc1plus: error: -Werror=cast-function-type: no option -Wcast-function-type
#20 28.80 CMakeFiles/cmTC_09005.dir/build.make:77: recipe for target 'CMakeFiles/cmTC_09005.dir/src.cxx.o' failed
#20 28.80 make[1]: *** [CMakeFiles/cmTC_09005.dir/src.cxx.o] Error 1
#20 28.80 make[1]: Leaving directory '/opt/pytorch/build/CMakeFiles/CMakeTmp'
#20 28.80 Makefile:127: recipe for target 'cmTC_09005/fast' failed
#20 28.80 make: *** [cmTC_09005/fast] Error 2
#20 28.80
#20 28.80
#20 28.80 Source file was:
#20 28.80 int main() { return 0; }
#20 DONE 29.0s
------
executor failed running [/bin/sh -c TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0 PTX 8.0" TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"     python setup.py install]: exit code: 1

I cannot get the error logs because they exist in the temporary filesystem for the image building process.

I find it somewhat strange that a building a stable release image is failing. Am I doing something wrong?


The Dockerfile:

# syntax = docker/dockerfile:experimental
#
# NOTE: To build this you will need a docker version > 18.06 with
#       experimental enabled and DOCKER_BUILDKIT=1
#
#       If you do not use buildkit you are not going to have a good time
#
#       For reference:
#           https://docs.docker.com/develop/develop-images/build_enhancements/
ARG BASE_IMAGE=ubuntu:18.04
ARG PYTHON_VERSION=3.8

FROM ${BASE_IMAGE} as dev-base
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ca-certificates \
        ccache \
        # cmake=3.10.2-1ubuntu2.18.04.2 \
        cmake \
        curl \
        git \
        libjpeg-dev \
        libpng-dev && \
    rm -rf /var/lib/apt/lists/*
RUN /usr/sbin/update-ccache-symlinks
RUN mkdir /opt/ccache && ccache --set-config=cache_dir=/opt/ccache
ENV PATH /opt/conda/bin:$PATH

FROM dev-base as conda
ARG PYTHON_VERSION=3.8
# Automatically set by buildx
ARG TARGETPLATFORM
# translating Docker's TARGETPLATFORM into miniconda arches
RUN case ${TARGETPLATFORM} in \
         "linux/arm64")  MINICONDA_ARCH=aarch64  ;; \
         *)              MINICONDA_ARCH=x86_64   ;; \
    esac && \
    curl -fsSL -v -o ~/miniconda.sh -O  "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-${MINICONDA_ARCH}.sh"
COPY requirements.txt .
RUN chmod  x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda install -y python=${PYTHON_VERSION} cmake conda-build pyyaml numpy ipython && \
    /opt/conda/bin/python -mpip install -r requirements.txt && \
    /opt/conda/bin/conda clean -ya

FROM dev-base as submodule-update
WORKDIR /opt/pytorch
COPY . .
RUN git submodule update --init --recursive --jobs 0

FROM conda as build
WORKDIR /opt/pytorch
COPY --from=conda /opt/conda /opt/conda
COPY --from=submodule-update /opt/pytorch /opt/pytorch
RUN --mount=type=cache,target=/opt/ccache \
    TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0 PTX 8.0" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    python setup.py install || cat /opt/pytorch/build/CMakeFiles/CMakeError.log

CodePudding user response:

The issue was with the COPY --from=submodule-update /opt/pytorch /opt/pytorch instruction. Some .bzl files were not getting copied. More precisely they were not getting added to the Docker build context because of a .dockerignore file. I've added the following line to the end of the .dockerignore and now it works:

!*.bzl

As far as I understand, this is a bug. These files are committed to the repo, so they should get copied.

  • Related