Home > front end >  How to avoid ValueError: could not convert string to float: '?'
How to avoid ValueError: could not convert string to float: '?'

Time:04-27

This is ML code and I am beginner. X and y are class and feature matrix

print(X.shape)
X.dtypes

output:

Age                  int64
Sex                  int64
chest pain type      int64
Trestbps             int64
chol                 int64
fbs                  int64
restecg              int64
thalach              int64
exang                int64
oldpeak            float64
slope                int64
ca                  object
thal                object
dtype: object
from sklearn.feature_selection import SelectKBest, f_classif

#Using ANOVA to create the new dataset with only best three selected features 
X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y)    #<-------- get error
X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
print("The dataset with best three selected features after using ANOVA:")
print(X_new_anova.head())
kmeans_anova = KMeans(n_clusters = 3).fit(X_new_anova)
labels_anova = kmeans_anova.labels_

#Counting the number of the labels in each cluster and saving the data into clustering_classes
clustering_classes_anova = {
 0: [0,0,0,0,0],
 1: [0,0,0,0,0],
 2: [0,0,0,0,0]
}
for i in range(len(y)):
     clustering_classes_anova[labels_anova[i]][y[i]]  = 1
        

###Finding the most appeared label in each cluster and computing the purity score
purity_score_anova = (max(clustering_classes_anova[0]) max(clustering_classes_anova[1]) max(clustering_classes_anova[2]))/len(y)
print(f"Purity score of the new data after using ANOVA {round(purity_score_anova*100, 2)}%")

This is the error I got:

#Using ANOVA to create the new dataset with only best three selected features
----> 4 X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y)
      5 X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
      6 print("The dataset with best three selected features after using ANOVA:")

ValueError: could not convert string to float: '?'

I don't know what is the meaning of "?" could you please tell me how to avoid this error?

CodePudding user response:

The meaning of the '?' is that there is this string (?) somewhere within your datafile that it cannot convert. I would just check your datafile to make sure that everything checks out. I would guess whoever made it put a ? somewhere that data could not be found.

can Delete a row using

DataFrame=Dataframe.drop(labels=3,axis=0)
'''
With 3 being used as a placeholder for whatever
row holds the ? so if row 40 has the empty ?, you would do # 40
'''
  • Related