Use first N colors from qualitative cmap to plot cluster scatter-CodePudding

I want to plot my clusters against their first two principle components, but using only the first N colors from matplotlibs 'Set1' cmap (dependent on number of clusters).

I understand I can access the color list and can slice it to get the number of colors I want, however when attempting this I get the error:

ValueError: array([[0.89411765, 0.10196078, 0.10980392, 1. ], [1. , 0.49803922, 0. , 1. ], [0.6 , 0.6 , 0.6 , 1. ]]) is not a valid value for name; supported values are 'Accent'...

...suggesting to me I have got the RGA values of the colors but not the names themselves?

This is the code I am attempting it with (where k is the number of clusters):

cmap = cm.get_cmap('Set1')
cmap = cmap(np.linspace(0, 1, k)) 

points = ax.scatter(data['PC1'], data['PC2'],c=data['cluster'], cmap=cmap ,alpha=0.7)
ax.legend(*points.legend_elements(), title='test')

CodePudding user response：

One possible solution is to loop over the unique cluster values:

import pandas as pd
x = np.random.uniform(size=10)
y = np.random.uniform(size=10)
color_val = np.random.randint(1, 5, 10)
df = pd.DataFrame({"PC1": x, "PC2": y, "cluster": color_val})

unique_color_val = df["cluster"].unique()
colors = cm.get_cmap('Set1').colors[:len(unique_color_val)]

plt.figure()
for i, ucv in enumerate(unique_color_val):
    sub_df = df[df["cluster"] == ucv]
    plt.scatter(sub_df["PC1"], sub_df["PC2"], color=colors[i], label="color val = %s" % ucv)
plt.legend()
plt.show()