I have a large pandas dataframe that I clustered and the cluster id is stored in a column of a dataframe. I would like to display clusters in such a way that each cluster has a different color. I tried doing this with a colormap but the problem is that I have too many points and clusters so nearby clusters get assign only slightly different colors, so when I plot all of them I just get a big mashup that looks like this:
Note that this is image contains about 4000 clusters, but because colors of clusters are just assigned top to bottom, nearby clusters blend together.
I would like nearby clusters to be painted in different colors so I tried making a random color for each cluster and then assign each point a color based on its cluster label like this:
# creating a color for each distinct cluster label
colors = [(random.random(), random.random(), random.random())
for _ in range(len(set(data['labels'])))]
# assigning color to a point based on its cluster label
for index, row in data.iterrows():
plt.scatter(row['x'], row['y'], color=colors[int(row['labels'])])
Now this program works but it is much slower that vectorized version above.
Is there a way to do color each cluster in clearly different colors without writing a for loop?
CodePudding user response:
This creates a random colormap of 256 colors that you can then pass to scatter :
def segmentation_cmap():
vals = np.linspace(0, 1, 256)
np.random.shuffle(vals)
return plt.cm.colors.ListedColormap(plt.cm.CMRmap(vals))
ax.scatter(row['x'],row['y'],c=row['labels'],s=1,cmap=segmentation_cmap())
You may add colors, but you would have trouble seeing the differences anyways at some point !