Import an already downloaded SpaCy language model to docker container without new download-CodePudding

I'd like to run multiple spacy language models on various docker containers. I don't want the docker image to contain the line RUN python -m spacy download en_core_web_lg, as other processes might have different language models.

My question is: Is it possible to download multiple spacy language models onto local (i.e. en_core_web_lg, en_core_web_md, ...), and then load these models into the python-spacy environment when the docker container spawns?

This process might have the following steps:

Spawn docker container and bind a volume "language_models/" to the container which contains a number of spacy models.
Run some spacy command such as python -m spacy download --local ./language_models/en_core_web_lg which points at the language model which you want the environment to have.

The hope is that, since the language model already exists on the shared volume, the download/import time is significantly reduced for each new container. Each container also would not have unnecessary language models on it, and the Docker image would not be specific to any language models at all.

CodePudding user response：

There are two ways to do this.

The easier one is to mount a volume in Docker with the model directory and specify it as a path. spaCy lets you call spacy.load("some/path"), so no pip install is required.

If you really need to use pip to install something, you can also download the zipped models and pip install that file. However by default that might involve making a copy of it, reducing benefits. If you unzip the model download and mount that you can use pipe -e (editable), which is usually used for develpoment. I wouldn't recommend this, but if you are using import en_core_web_sm or something and have difficulty refactoring it might be what you want.

CodePudding user response：

Thanks for the comment @polm23! I had an additional layer of complexity since the SpaCy model was ultimately used to train a Rasa model. The solution I've opted for is to save models locally using:

nlp = spacy.load(model)
nlp.to_disk(f'language_models/{model}')

And then make the specific model directory visible to the docker container using a mounted volume. Then, in Rasa anyway, you can import the language model using a local path

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "../../language_models/MODEL_NAME"
recipe: default.v1