Python version: 3.6.9
I've used pickle
to dump a machine learning model into a file, and when I try to run a prediction on it using Flask, it fails with ModuleNotFoundError: No module named 'predictors'
. How can I fix this error so that it recognizes my model, whether I try to run a prediction via Flask or via the Python command (e.g. python predict_edu.py
)?
Here is my file structure:
- video_discovery
__init__.py
- data_science
- model
- __init__.py
- predict_edu.py
- predictors.py
- train_model.py
Here's my predict_edu.py file:
import pickle
with open('model', 'rb') as f:
bow_model = pickle.load(f)
Here's my predictors.py file:
from sklearn.base import TransformerMixin
# Basic function to clean the text
def clean_text(text):
# Removing spaces and converting text into lowercase
return text.strip().lower()
# Custom transformer using spaCy
class predictor_transformer(TransformerMixin):
def transform(self, X, **transform_params):
# Cleaning Text
return [clean_text(text) for text in X]
def fit(self, X, y=None, **fit_params):
return self
def get_params(self, deep=True):
return {}
Here's how I train my model:
python data_science/train_model.py
Here's my train_model.py file:
from predictors import predictor_transformer
# pipeline = Pipeline([("cleaner", predictor_transformer()), ('vectorizer', bow_vector), ('classifier', classifier_18p)])
pipeline = Pipeline([("cleaner", predictor_transformer())])
with open('model', 'wb') as f:
pickle.dump(pipeline, f)
My Flask app is in: video_discovery/__init__.py
Here's how I run my Flask app:
FLASK_ENV=development FLASK_APP=video_discovery flask run
I believe the issue may be occurring because I'm training the model by running the Python script directly instead of using Flask, so there might be some namespace issues, but I'm not sure how to fix this. It takes a while to train my model, so I can't exactly wait on an HTTP request.
What am I missing that might fix this issue?
CodePudding user response:
It seems a bit strange that you get that error when executing predict_edu.py
, as it is in the same directory as predictors.py
, and thus, using absolute import such as from predictors import predictor_transformer
(without the dot .
operator) should normally work as expected. However, below are a few options that you could try out, if the error persists.
Option 1
You could add the parent directory of the predictors
file to the system PATH
variable, before attempting to import the module, as described here.
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from predictors import predictor_transformer
Option 2
Use relative imports, e.g., from .predictors import...
, and make sure you run the script outside the project's directory, as shown below. The -m
option "searches the sys.path
for the named module and execute its contents as the __main__
module", and not as the top-level script. Read more about the -m
option in the following references: [1], [2], [3], [4], [5], [6]. Read more about "relative imports" here: [1], [2], [3], [4].
python -m video_discovery.data_science.predict_edu
However, the PEP 8 style guide recommends using absolute imports in general.
Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path)
In certain cases, however, absolute imports can get quite verbose, depending on the complexity of the directory structure, as shown below. On the other hand, "relative imports can be messy, particularly for shared projects where directory structure is likely to change". They are also "not as readable as absolute ones, and it is hard to tell the location of the imported resources". Read more about Python Import and Absolute vs Relative Imports.
from package1.subpackage2.subpackage3.subpackage4.module5 import function6
Option 3
Include the directory containing your package directory in PYTHONPATH
and use absolute imports instead. PYTHONPATH
is used to set the path for user-defined modules, so that they can be directly imported into a Python script. The PYTHONPATH
variable is a string with a list of directories that need to be added to the sys.path
directory list by Python. The primary use of this variable is to allow users to import modules that have not yet made into an installable Python package. Read more about it here and here.
For instance, let’s say you wanted add the directory /Users/my_user/code
to the PYTHONPATH
:
On Mac
- Open
Terminal.app
- Open the file
~/.bash_profile
in your text editor – e.g.atom ~/.bash_profile
- Add the following line to the end:
export PYTHONPATH="/Users/my_user/code"
- Save the file.
- Close
Terminal.app
- Start
Terminal.app
again, to read in the new settings, and typeecho $PYTHONPATH
. It should show something like/Users/my_user/code
.
On Linux
Open your favorite terminal program
Open the file
~/.bashrc
in your text editor – e.g.atom ~/.bashrc
Add the following line to the end:
export PYTHONPATH=/home/my_user/code
Save the file.
Close your terminal application.
Start your terminal application again, to read in the new settings, and type
echo $PYTHONPATH
. It should show something like/home/my_user/code
.
On Windows
- Open
This PC
(orComputer
), right-click inside and selectProperties
. - From the computer properties dialog, select
Advanced system settings
on the left. - From the advanced system settings dialog, choose the
Environment variables
button. - In the Environment variables dialog, click the
New
button in the top half of the dialog, to make a new user variable: - Give the variable name as
PYTHONPATH
and in value add the path to your module directory. ChooseOK
andOK
again to save this variable. - Now open a cmd window and type
echo %PYTHONPATH%
to confirm the environment variable is correctly set. Remember to open a new cmd window to run your Python program, so that it picks up the new settings inPYTHONPATH
.
Option 4
Another solution would be to install the package in an editable state (all edits made to the .py files will be automatically included in the installed package), as described here and here. However, the amount of work required to get this to work might make Option 3 a better choice for you.
CodePudding user response:
From https://docs.python.org/3/library/pickle.html:
pickle
can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored.
When you run python data_science/train_model.py
and import from predictors
, Python imports predictors
as a top-level module and predictor_transformer
is in that module.
However, when you run a prediction via Flask from the parent folder of video_discovery
, predictor_transformer
is in the video_discovery.data_science.predictors
module.
Use relative imports and run from a consistent path
train_model.py: Use relative import
# from predictors import predictor_transformer # -
from .predictors import predictor_transformer #
Train model: Run train_model
with video_discovery
as top-level module
# python data_science/train_model.py # -
python -m video_discovery.data_science.train_model #
Run a prediction via a Python command: Run predict_edu
with video_discovery
as top-level module
# python predict_edu.py # -
python -m video_discovery.data_science.predict_edu #
Run a prediction via Flask: (no change, already run with video_discovery
as top-level module)
FLASK_ENV=development FLASK_APP=video_discovery flask run