Home > OS >  Shapes on my legend seem to be in a different order than that shown in plot ggplot2
Shapes on my legend seem to be in a different order than that shown in plot ggplot2

Time:11-10

I have a data frame in R that holds PCA data and looks roughly like this:

obsnames PC1 PC2 PC3
one 2.46 2.57 1.366962e-15
two -3.47 0.84 3.053113e-16
three 1.01 -3.40 7.077672e-16

You could load the exact variable with this:

structure(list(obsnames = c("one", "two", "three"), PC1 = c(2.46310908247957, 
-3.46877162330214, 1.00566254082257), PC2 = c(2.56831624877025, 
0.836571395923965, -3.40488764469422), PC3 = c(1.36696209906972e-15, 
3.05311331771918e-16, 7.07767178198537e-16), `Sample Size` = c(48L, 
74L, 52L)), row.names = c("one", "two", "three"), class = "data.frame")

Now. I'm trying to plot this PCA with ggplo2 geom_point by using only those shapes that allow for the "fill" aesthetic (21-25 iirc). However, I'm having trouble creating the legend such that it matches both the shape and the color displayed in the plot. I gave up trying to figure it out myself, and I find it very strange given that I'm feeding it pretty much all manually. This is my plotting line:

len <- length(pca_data$obsnames)
ggplot(pca_data, aes_string(x=x, y=y))   
  geom_point(shape = rep_len(c(21, 22, 23, 24, 25) length.out = 
  len),   
             color = "black", size = 3, aes(fill=obsnames))   
  theme_bw()   
  theme(legend.position="right")   
  xlab(label_x)   
  ylab(label_y)   
  ggtitle(main)   
  theme(plot.title = element_text(hjust = 0, face="bold"))   
  geom_hline(aes(0), size=.2,yintercept=0)   
  geom_vline(aes(0), size=.2,xintercept=0)   
  coord_equal()   
  geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 3, vjust=0.3, color="grey", fontface="bold")   
  geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), color="grey", linetype="dotted")   
  scale_fill_manual(values = rep_len(c("red", "blue", "green", "orange", "yellow", "purple", "pink", "light blue", "white", "black", "gold"), length.out = len))   
  guides(fill=guide_legend(override.aes=list(shape=rep_len(c(21, 22, 23, 24, 25), length.out = len))))

Which outputs the following plot: pca image

As you can see. The legend shows "two" as a green diamond, when in reality it should be the green square. Also, when I happen to use the same amount of points (obsnames) than shapes in my vector of shapes: c(21, 22, 23, 24, 25); that is, 5, then the problem doesn't appear. But I really don't see what I'm doing wrong...

CodePudding user response:

This is one of those things that works better if you just let ggplot handle it; that is, by making sure you're putting the shape/fill specification in aes(). I've pared down the features of your plot a little for this demonstration; it shouldn't be too hard to add them back in. Even more importantly, notice that I create a named vector to pass as the values argument of scale_*_manual(); this ensures that the values and labels will match up the right way:

len <- length(pca_data$obsnames)
shapes <- rep_len(x = c(21, 22, 23, 24, 25), length.out = len)
ptcols <- rep_len(x = c(
    "red", "blue", "green", "orange", "yellow", "purple", "pink",
    "light blue", "white", "black", "gold"
), length.out = len)
names(shapes) <- pca_data$obsnames
names(ptcols) <- pca_data$obsnames
ggplot(data = pca_data, mapping = aes(x = PC1, y = PC2))  
    geom_point(aes(shape = obsnames, fill = obsnames), color = "black")  
    scale_fill_manual(values  = ptcols)  
    scale_shape_manual(values = shapes)  
    theme_bw()

enter image description here

  • Related