I want to create some unit tests to make sure the correct NotImplemented
exceptions are thrown for a module I am developing. Is there a way that I could use some dummy data, create and fit multiple sklearn
models to feed into the unit tests?
I'm looking for a parametric solution, that in some way I could go through many models instead of manually defining each different model. I do not need well-defined models, or models that make sense, just models that can be constructed with their default parameters.
CodePudding user response:
The pytest
methods for fixtures and test parameterization are one way to approach this. Here's an example using pytest.mark.parametrize
to assert that two ensemble methods can perfectly predict their training data:
# File: `test_ml_models.py`
import pytest
import numpy as np
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
@pytest.mark.parametrize("model", [RandomForestClassifier, GradientBoostingClassifier])
@pytest.mark.parametrize("X", [np.array([[0, 1], [1, 0]])])
@pytest.mark.parametrize("y", [np.array([0, 1])])
def test_models(model, X, y):
clf = model().fit(X, y)
assert np.all(clf.predict(X) == y)
Then running pytest
shows our test_models
test was unfolded into two tests:
$ pytest test_ml_models.py
=================================== test session starts ==============
platform linux -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/hayesall
plugins: anyio-3.3.0, dash-1.20.0, cov-2.12.1
collected 2 items
test_ml_models.py .. [100%]
=================================== 2 passed in 0.59s ================
For further reading, you might skim how scikit-learn
or imbalanced-learn
approach unit testing: