Random colors in pyplot.scatter based on value-CodePudding

I have a large pandas dataframe that I clustered and the cluster id is stored in a column of a dataframe. I would like to display clusters in such a way that each cluster has a different color. I tried doing this with a colormap but the problem is that I have too many points and clusters so nearby clusters get assign only slightly different colors, so when I plot all of them I just get a big mashup that looks like this:

Note that this is image contains about 4000 clusters, but because colors of clusters are just assigned top to bottom, nearby clusters blend together.

I would like nearby clusters to be painted in different colors so I tried making a random color for each cluster and then assign each point a color based on its cluster label like this:

# creating a color for each distinct cluster label
colors = [(random.random(), random.random(), random.random())
          for _ in range(len(set(data['labels'])))]
# assigning color to a point based on its cluster label
for index, row in data.iterrows():
    plt.scatter(row['x'], row['y'], color=colors[int(row['labels'])])

Now this program works but it is much slower that vectorized version above.

Is there a way to do color each cluster in clearly different colors without writing a for loop?

CodePudding user response：

This creates a random colormap of 256 colors that you can then pass to scatter :

def segmentation_cmap():
    vals = np.linspace(0, 1, 256)
    np.random.shuffle(vals)
    return plt.cm.colors.ListedColormap(plt.cm.CMRmap(vals))


ax.scatter(row['x'],row['y'],c=row['labels'],s=1,cmap=segmentation_cmap())

You may add colors, but you would have trouble seeing the differences anyways at some point !