I have 505 sets of patient data (rows), each containing 17 sets (cols) of 3D [x,y,z] arrays.
In : data.iloc[0][0]
Out: array([ -23.47808471, -9.92158009, 1447.74107884])
Each set of patient data is a collection of 3D points marking centers of vertebrae, with 17 vertebrae marked per patient. I am attempting to use k-means clustering to classify how many different types of spines there are in the dataset, however, when trying to fit the model, I get errors such as "ValueError: setting an array element with a sequence." I am not quite sure on how to manipulate my dataframe so that each set of patient data is separate from one another.
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4, n_init=10, max_iter=300)
kmeans.fit(data)
Thank you!
CodePudding user response:
kmeans.fit
functions expects a 2-D array as input whereas in your case data is a 3-D array. One thing you can do is unravel the data points and turn them into individual features. Like this,
# Do this for all positions
data['Spine_L1_Center_x'] = data['Spine_L1_Center'].apply(lambda x: x[0])
data['Spine_L1_Center_y'] = data['Spine_L1_Center'].apply(lambda x: x[1])
data['Spine_L1_Center_z'] = data['Spine_L1_Center'].apply(lambda x: x[2])
data.drop(columns=['Spine_L1_Center', ... ], inplace=True)
And then try to fit that new data.