I'm trying to plot with kmeans, but I'm stuck because one column is with dates, and it's making a lot of problems. (you can see in the screenshot the data enter image description here)
I've alreay used to_datetime, so what now I should do?
how to pass this problem and plot it?
Thank you in advance!
from sklearn.cluster import KMeans
AAPL= pd.read_csv('AAPL.csv', header=0, squeeze=True)
#sd=store_data.head(100)
x = pd.to_datetime(AAPL.iloc[:, [0,1]],dayfirst=True)
print(x)
kmeans4 = KMeans(n_clusters=4)
y_kmeans4 = kmeans4.fit_predict(x)
print(y_kmeans4)
print(kmeans4.cluster_centers_)
plt.scatter(x[:,0],x[:,1],c=y_kmeans4,cmap='rainbow')
plt.scatter(kmeans4.cluster_centers_[:,0] ,kmeans4.cluster_centers_[:,1],color='black')
CodePudding user response:
You need select first column only:
x = pd.to_datetime(AAPL.iloc[:, 0],dayfirst=True)
If use:
x = pd.to_datetime(AAPL.iloc[:, [0,1]],dayfirst=True)
it select first and second column and raise error, because pd.to_datetime
working only if passed columns year, month, days
like this solution.