I have a data frame in R that holds PCA data and looks roughly like this:
obsnames | PC1 | PC2 | PC3 |
---|---|---|---|
one | 2.46 | 2.57 | 1.366962e-15 |
two | -3.47 | 0.84 | 3.053113e-16 |
three | 1.01 | -3.40 | 7.077672e-16 |
You could load the exact variable with this:
structure(list(obsnames = c("one", "two", "three"), PC1 = c(2.46310908247957,
-3.46877162330214, 1.00566254082257), PC2 = c(2.56831624877025,
0.836571395923965, -3.40488764469422), PC3 = c(1.36696209906972e-15,
3.05311331771918e-16, 7.07767178198537e-16), `Sample Size` = c(48L,
74L, 52L)), row.names = c("one", "two", "three"), class = "data.frame")
Now. I'm trying to plot this PCA with ggplo2 geom_point by using only those shapes that allow for the "fill" aesthetic (21-25 iirc). However, I'm having trouble creating the legend such that it matches both the shape and the color displayed in the plot. I gave up trying to figure it out myself, and I find it very strange given that I'm feeding it pretty much all manually. This is my plotting line:
len <- length(pca_data$obsnames)
ggplot(pca_data, aes_string(x=x, y=y))
geom_point(shape = rep_len(c(21, 22, 23, 24, 25) length.out =
len),
color = "black", size = 3, aes(fill=obsnames))
theme_bw()
theme(legend.position="right")
xlab(label_x)
ylab(label_y)
ggtitle(main)
theme(plot.title = element_text(hjust = 0, face="bold"))
geom_hline(aes(0), size=.2,yintercept=0)
geom_vline(aes(0), size=.2,xintercept=0)
coord_equal()
geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 3, vjust=0.3, color="grey", fontface="bold")
geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), color="grey", linetype="dotted")
scale_fill_manual(values = rep_len(c("red", "blue", "green", "orange", "yellow", "purple", "pink", "light blue", "white", "black", "gold"), length.out = len))
guides(fill=guide_legend(override.aes=list(shape=rep_len(c(21, 22, 23, 24, 25), length.out = len))))
Which outputs the following plot:
As you can see. The legend shows "two" as a green diamond, when in reality it should be the green square. Also, when I happen to use the same amount of points (obsnames) than shapes in my vector of shapes: c(21, 22, 23, 24, 25); that is, 5, then the problem doesn't appear. But I really don't see what I'm doing wrong...
CodePudding user response:
This is one of those things that works better if you just let ggplot
handle it; that is, by making sure you're putting the shape/fill specification in aes()
. I've pared down the features of your plot a little for this demonstration; it shouldn't be too hard to add them back in. Even more importantly, notice that I create a named vector to pass as the values
argument of scale_*_manual()
; this ensures that the values and labels will match up the right way:
len <- length(pca_data$obsnames)
shapes <- rep_len(x = c(21, 22, 23, 24, 25), length.out = len)
ptcols <- rep_len(x = c(
"red", "blue", "green", "orange", "yellow", "purple", "pink",
"light blue", "white", "black", "gold"
), length.out = len)
names(shapes) <- pca_data$obsnames
names(ptcols) <- pca_data$obsnames
ggplot(data = pca_data, mapping = aes(x = PC1, y = PC2))
geom_point(aes(shape = obsnames, fill = obsnames), color = "black")
scale_fill_manual(values = ptcols)
scale_shape_manual(values = shapes)
theme_bw()