I am writing this script to create a color-mapped plot from the attached data frame.
here is the code
biaxial_plot_ICOS_PD1 = sorted_df.plot.scatter(x="ICOS - costimulator:Cyc_14_ch_4"
, y="PD-1 - checkpoint:Cyc_12_ch_4"
, c="ClusterName", colormap='viridis', s=50)
But I get this error
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['CD4 T cells' 'CD4 T cells' 'CD4 T cells' ... 'CD4 T cells CD45RO ' 'CD4 T cells CD45RO ' 'CD4 T cells GATA3 ']
sorted_df:
CodePudding user response:
When you provide a column label to the c
parameter, the values of that column should be valid numbers to be mapped to colors according to the colormap provided. From the docs of DataFrame.plot.scatter
c : str, int or array-like, optional
The color of each point. Possible values are:
- A single color string referred to by name, RGB or RGBA code, for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each point’s color recursively. For instance [‘green’,’yellow’] all points will be filled in green or yellow, alternatively.
- A column name or position whose values will be used to color the marker points according to a colormap.
The c
parameter is not directly interpreted as "color by this column". If you want something like that use seaborn
.
In your case, it seems that you want to color based on ClusterName
, so you can use groupby
ngroup
, so that each ClusterName
is mapped to a distinct integer, meaning to a different color.
This should work
cluster_colors = sorted_df.groupby('ClusterName').ngroup()
biaxial_plot_ICOS_PD1 = sorted_df.plot.scatter(x="ICOS - costimulator:Cyc_14_ch_4",
y="PD-1 - checkpoint:Cyc_12_ch_4",
c=cluster_colors, colormap='viridis', s=50)