How to parametrise unit tests for sklearn models-CodePudding

I want to create some unit tests to make sure the correct NotImplemented exceptions are thrown for a module I am developing. Is there a way that I could use some dummy data, create and fit multiple sklearn models to feed into the unit tests?

I'm looking for a parametric solution, that in some way I could go through many models instead of manually defining each different model. I do not need well-defined models, or models that make sense, just models that can be constructed with their default parameters.

CodePudding user response：

The pytest methods for fixtures and test parameterization are one way to approach this. Here's an example using pytest.mark.parametrize to assert that two ensemble methods can perfectly predict their training data:

# File: `test_ml_models.py`
import pytest
import numpy as np
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier


@pytest.mark.parametrize("model", [RandomForestClassifier, GradientBoostingClassifier])
@pytest.mark.parametrize("X", [np.array([[0, 1], [1, 0]])])
@pytest.mark.parametrize("y", [np.array([0, 1])])
def test_models(model, X, y):
    clf = model().fit(X, y)
    assert np.all(clf.predict(X) == y)

Then running pytest shows our test_models test was unfolded into two tests:

$ pytest test_ml_models.py
=================================== test session starts ==============
platform linux -- Python 3.8.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/hayesall
plugins: anyio-3.3.0, dash-1.20.0, cov-2.12.1
collected 2 items

test_ml_models.py ..                                            [100%]

=================================== 2 passed in 0.59s ================

For further reading, you might skim how scikit-learn or imbalanced-learn approach unit testing: