Home > OS >  selectKBest with chi2 throws ValueError: could not convert string to float: 'Self_emp_not_inc&#
selectKBest with chi2 throws ValueError: could not convert string to float: 'Self_emp_not_inc&#

Time:07-18

I am trying to select the best categorical features for a classification problem with chi2 and selectKBest. Here, I've sorted out the categorical columns: categorical-cols I separated the features and target like this and fit it to selectKBest:

from sklearn.feature_selection import chi2, SelectKBest

X, y = df_cat_kbest.iloc[:, :-1], df_cat_kbest.iloc[:, -1]
selector = SelectKBest(score_func=chi2, k=3).fit_transform(X, y)

When I run it, I am getting the error:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13272\2211654466.py in <module>
----> 1 selector = SelectKBest(score_func=chi2, k=3).fit_transform(X, y)

E:\Anaconda\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
    853         else:
    854             # fit method of arity 2 (supervised transformation)
--> 855             return self.fit(X, y, **fit_params).transform(X)
    856 
    857 

...
...

E:\Anaconda\lib\site-packages\pandas\core\generic.py in __array__(self, dtype)
   1991 
   1992     def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
-> 1993         return np.asarray(self._values, dtype=dtype)
   1994 
   1995     def __array_wrap__(

ValueError: could not convert string to float: 'Self_emp_not_inc'

As far as I know, I can apply chi-square on categorical columns. Here, all the features are categorical, also the target. Then why is it saying that 'it can't convert string to float'?

CodePudding user response:

Encode features would do the job. For example

from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_selection import chi2, SelectKBest
from sklearn.pipeline import make_pipeline

X, y = df_cat_kbest.iloc[:, :-1], df_cat_kbest.iloc[:, -1]

selector = make_pipe(OneHotEncoder(drop='first'),SelectKBest(score_func=chi2, k=3)).fit_transform(X, y)

We have added a pre-processor! One-hot encoding. You can choose other encoding. The bottom line is that you need to transform your objects to numerical data ;)

There are other contributors encoders from contrib.scikit-category_encoders that might be helpful to your need

  • Related