Home > Mobile >  Create a scatterplot where color corresponds to a variable with multiple values
Create a scatterplot where color corresponds to a variable with multiple values

Time:12-02

I have the following dataframe df, where the variable types corresponds to up to 3 types for each ID (the dataset has approximately 3000 rows):

ID   types  grade  num 
a01  a,b,c   7.1    1 
a02  c,d     7.7    3   
a03  c       7.3    4   
a04  a,c,f   7.9    5   
a05  a,c,e   6.7    3

I want to create a scatterplot, where the x axis corresponds to the num column, the y axis corresponds to the grade and the color of each point corresponds to its type, similar to this: enter image description here

CodePudding user response:

Jon Spring's answer is a good fast way to get your data visualized.

I think I would start with what you want to actually see in your plot (I wasn't sure when I read your question). Do you want points that share types (like all "a, c" points) to be colored the same? Or do you want them to be duplicated so that they show up in 'a' color and 'c' color on the same point? Jitter or changing alpha levels are good ways to make overlapping points actually show up in the vis.

Because your y values are .1 apart, but your x values are 1 apart, you might change the jitter setting so that it only jitters along the x axis. And then set alpha levels to .5 or .35 even to accommodate overplotting?

geom_jitter(width = 0.1, alpha=.5)

It seems like there's a lot of data to fit into one vis, so you might try a facet wrap instead to see if it makes the graphs more readable.

facet_wrap(~types)

  • Related