This is ML code and I am beginner. X and y are class and feature matrix
print(X.shape)
X.dtypes
output:
Age int64
Sex int64
chest pain type int64
Trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca object
thal object
dtype: object
from sklearn.feature_selection import SelectKBest, f_classif
#Using ANOVA to create the new dataset with only best three selected features
X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y) #<-------- get error
X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
print("The dataset with best three selected features after using ANOVA:")
print(X_new_anova.head())
kmeans_anova = KMeans(n_clusters = 3).fit(X_new_anova)
labels_anova = kmeans_anova.labels_
#Counting the number of the labels in each cluster and saving the data into clustering_classes
clustering_classes_anova = {
0: [0,0,0,0,0],
1: [0,0,0,0,0],
2: [0,0,0,0,0]
}
for i in range(len(y)):
clustering_classes_anova[labels_anova[i]][y[i]] = 1
###Finding the most appeared label in each cluster and computing the purity score
purity_score_anova = (max(clustering_classes_anova[0]) max(clustering_classes_anova[1]) max(clustering_classes_anova[2]))/len(y)
print(f"Purity score of the new data after using ANOVA {round(purity_score_anova*100, 2)}%")
This is the error I got:
#Using ANOVA to create the new dataset with only best three selected features
----> 4 X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y)
5 X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
6 print("The dataset with best three selected features after using ANOVA:")
ValueError: could not convert string to float: '?'
I don't know what is the meaning of "?" could you please tell me how to avoid this error?
CodePudding user response:
The meaning of the '?' is that there is this string (?) somewhere within your datafile that it cannot convert. I would just check your datafile to make sure that everything checks out. I would guess whoever made it put a ? somewhere that data could not be found.
can Delete a row using
DataFrame=Dataframe.drop(labels=3,axis=0)
'''
With 3 being used as a placeholder for whatever
row holds the ? so if row 40 has the empty ?, you would do # 40
'''