According to the docs, the .fit
function :
fit(X, y=None) : Do nothing and return the estimator unchanged.
This method is just there to implement the usual API and hence work in pipelines.
However, when I use the fit function, the Normalizer "fits" to the shape of the X passed, and expects the same number of features when the transform function is used thereafter.
eg:
A = np.random.rand(1,7)
B = np.random.rand(1,5)
print("A :",A,"\n","B :",B)
>>> A : [[0.56973872 0.74769087 0.81626309 0.03873601 0.71216399 0.31807755 0.96527768]]
B : [[0.49805279 0.73939067 0.85949423 0.79824846 0.52750957]]
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
#normalizer.fit(A)
a = normalizer.transform(A)
b = normalizer.transform(B)
print("a :",a,"\n","b :",b)
>>> a : [[0.32403221 0.42524041 0.46424006 0.02203065 0.40503491 0.18090287
0.54899035]]
b : [[0.3182623 0.47248039 0.54922815 0.5100913 0.33708558]]
However, when the fit function is called in this ValueError is raised :
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
normalizer.fit(A)
a = normalizer.transform(A)
b = normalizer.transform(B)
print("a :",a,"\n","b :",b)
>>> ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [28], in <module>
4 normalizer.fit(A)
5 a = normalizer.transform(A)
----> 6 b = normalizer.transform(B)
7 print("a :",a,"\n","b :",b)
File ~\PycharmProjects\notebookWith_LSP\venv\lib\site-packages\sklearn\preprocessing\_data.py:1948, in Normalizer.transform(self, X, copy)
1931 """Scale each non zero row of X to unit norm.
1932
1933 Parameters
(...)
1945 Transformed array.
1946 """
1947 copy = copy if copy is not None else self.copy
-> 1948 X = self._validate_data(X, accept_sparse="csr", reset=False)
1949 return normalize(X, norm=self.norm, axis=1, copy=copy)
File ~\PycharmProjects\notebookWith_LSP\venv\lib\site-packages\sklearn\base.py:600, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
597 out = X, y
599 if not no_val_X and check_params.get("ensure_2d", True):
--> 600 self._check_n_features(X, reset=reset)
602 return out
File ~\PycharmProjects\notebookWith_LSP\venv\lib\site-packages\sklearn\base.py:400, in BaseEstimator._check_n_features(self, X, reset)
397 return
399 if n_features != self.n_features_in_:
--> 400 raise ValueError(
401 f"X has {n_features} features, but {self.__class__.__name__} "
402 f"is expecting {self.n_features_in_} features as input."
403 )
ValueError: X has 5 features, but Normalizer is expecting 7 features as input.
What exactly am I missing here ?
CodePudding user response:
The fit method doesn't learn a function but it stills validates the data calling self._validate_data(X)
. And the validation function by default blocks the input feature size for later consistency unless called with reset=False
.
In _validate_data
function:
reset : bool, default=True
Whether to reset the `n_features_in_` attribute.
If False, the input will be checked for consistency with data
provided when reset was last True
Unfortunately normaliser.fit
doesn't seem to forward keyword arguments to validate data.