Home > front end >  ValueError: could not convert string to float: 'Mme'
ValueError: could not convert string to float: 'Mme'

Time:12-20

When I run the following code in Jupyter Lab

import numpy as np
from sklearn.feature_selection import SelectKBest,f_classif
import matplotlib.pyplot as plt

predictors = ["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked","FamilySize","Title","NameLength"]
selector = SelectKBest(f_classif,k=5)
selector.fit(titanic[predictors],titanic["Survived"])

Then it went errors and note that ValueError: could not convert string to float: 'Mme',details are like these:

  ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    C:\Users\ADMINI~1\AppData\Local\Temp/ipykernel_17760/1637555559.py in <module>
          5 predictors = ["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked","FamilySize","Title","NameLength"]
          6 selector = SelectKBest(f_classif,k=5)
    ----> 7 selector.fit(titanic[predictors],titanic["Survived"])
     ......
    
    ValueError: could not convert string to float: 'Mme'

I tried to print titanic[predictors] and titanic["Survived"],then the details are follows:

    Pclass  Sex Age SibSp   Parch   Fare    Embarked    FamilySize  Title   NameLength
0   3   0   22.0    1   0   7.2500  0   1   1   23
1   1   1   38.0    1   0   71.2833 1   1   3   51
2   3   1   26.0    0   0   7.9250  0   0   2   22
3   1   1   35.0    1   0   53.1000 0   1   3   44
4   3   0   35.0    0   0   8.0500  0   0   1   24
... ... ... ... ... ... ... ... ... ... ...
886 2   0   27.0    0   0   13.0000 0   0   6   21
887 1   1   19.0    0   0   30.0000 0   0   2   28
888 3   1   28.0    1   2   23.4500 0   3   2   40
889 1   0   26.0    0   0   30.0000 1   0   1   21
890 3   0   32.0    0   0   7.7500  2   0   1   19
891 rows × 10 columns

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: Survived, Length: 891, dtype: int64

How to Solve this Problem?

CodePudding user response:

is it printing column labels in first line? if so then you do proper data assigning so assign the array starting from second row array[1:,:] otherwise try to look into it and see where is "Mme" string located so you understand how the code is fetching it.

CodePudding user response:

When you are trying to fit some algorithm (in your case SelectKBest), you need to be aware of your data. And, almost all time you need to preprocess it.

Take a look to your data:

  • Do you have categorical features or they are numerical? Or a mix?
  • Do you have NaN values?
  • ...

Most of algorithm don't accept categorical features, and you will need to make a transformation to numerical one (evaluate the use of OneHotEncoder).

You will have the same problem with NaN values.

In conclusion, before start fitting, you have to preprocess your data.

  • Related