How to avoid ValueError: could not convert string to float: '?'-CodePudding

This is ML code and I am beginner. X and y are class and feature matrix

print(X.shape)
X.dtypes

output:

Age                  int64
Sex                  int64
chest pain type      int64
Trestbps             int64
chol                 int64
fbs                  int64
restecg              int64
thalach              int64
exang                int64
oldpeak            float64
slope                int64
ca                  object
thal                object
dtype: object

from sklearn.feature_selection import SelectKBest, f_classif

#Using ANOVA to create the new dataset with only best three selected features 
X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y)    #<-------- get error
X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
print("The dataset with best three selected features after using ANOVA:")
print(X_new_anova.head())
kmeans_anova = KMeans(n_clusters = 3).fit(X_new_anova)
labels_anova = kmeans_anova.labels_

#Counting the number of the labels in each cluster and saving the data into clustering_classes
clustering_classes_anova = {
 0: [0,0,0,0,0],
 1: [0,0,0,0,0],
 2: [0,0,0,0,0]
}
for i in range(len(y)):
     clustering_classes_anova[labels_anova[i]][y[i]]  = 1
        

###Finding the most appeared label in each cluster and computing the purity score
purity_score_anova = (max(clustering_classes_anova[0]) max(clustering_classes_anova[1]) max(clustering_classes_anova[2]))/len(y)
print(f"Purity score of the new data after using ANOVA {round(purity_score_anova*100, 2)}%")

This is the error I got:

#Using ANOVA to create the new dataset with only best three selected features
----> 4 X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y)
      5 X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
      6 print("The dataset with best three selected features after using ANOVA:")

ValueError: could not convert string to float: '?'

I don't know what is the meaning of "?" could you please tell me how to avoid this error?

CodePudding user response：

The meaning of the '?' is that there is this string (?) somewhere within your datafile that it cannot convert. I would just check your datafile to make sure that everything checks out. I would guess whoever made it put a ? somewhere that data could not be found.

can Delete a row using

DataFrame=Dataframe.drop(labels=3,axis=0)
'''
With 3 being used as a placeholder for whatever
row holds the ? so if row 40 has the empty ?, you would do # 40
'''