I have the following dataframe df
, where the variable types
corresponds to up to 3 types for each ID (the dataset has approximately 3000 rows):
ID types grade num
a01 a,b,c 7.1 1
a02 c,d 7.7 3
a03 c 7.3 4
a04 a,c,f 7.9 5
a05 a,c,e 6.7 3
I want to create a scatterplot, where the x axis corresponds to the num
column, the y axis corresponds to the grade
and the color of each point corresponds to its type, similar to this:
CodePudding user response:
Jon Spring's answer is a good fast way to get your data visualized.
I think I would start with what you want to actually see in your plot (I wasn't sure when I read your question). Do you want points that share types (like all "a, c" points) to be colored the same? Or do you want them to be duplicated so that they show up in 'a' color and 'c' color on the same point? Jitter or changing alpha levels are good ways to make overlapping points actually show up in the vis.
Because your y values are .1 apart, but your x values are 1 apart, you might change the jitter setting so that it only jitters along the x axis. And then set alpha levels to .5 or .35 even to accommodate overplotting?
geom_jitter(width = 0.1, alpha=.5)
It seems like there's a lot of data to fit into one vis, so you might try a facet wrap instead to see if it makes the graphs more readable.
facet_wrap(~types)