Home > Software engineering >  understanding aesthetics in ggplot for PCA
understanding aesthetics in ggplot for PCA

Time:11-02

I want to plot a PCA plot for the below data:

Samples X   Y   condition
ABC0    -4902321.6  -1166806.4  abc0
A   1964182.5   -182574.88  abc
B   3230051.7   -169413.85  abc
C   4348087.3   -1412510.48 abc
EGH0    -4895769.1  -1279998.8  egh0
E   -623590.7   24402.79    egh
G   -396252.4   -515215.13  egh
H   151838.9    857007  egh
O   -4168497.7  659968.17   o
P   4099128.1   -366815.68  prs
R   3180839.9   -37730.98   prs
S   3713295.7   -374523.83  prs
XYZ0    -4768219.8  -540444.8   xyz0
X   188488.4    559643.8    xyz
Y   -599848.9   1506850.89  xyz
Z   -521412.4   2438162.17  xyz

Below is my code using ggplot:

ggplot(data = pca.data, aes(x=X, y=Y,color = condition)) 
    geom_point(size = 3) 
    geom_text(aes(label = Samples,size=4))

I want to understand the aesthetics part. If I use label=pca.data$Samples,size=4 not under aes, my graph looks different. What is the correct way, putting label under aes in geom_text?

CodePudding user response:

ggplot2 uses aesthetics to map values in your data frame to individual data points. When objects are included in a call to aes, the package assumes that those are things that will change over the course of your data set, like x-values or y-values for which each data point has a different value. When you include things in a call to ggplot outside of aes, the package assumes that the value is fixed across the data and doesn't need to be mapped to a column.

In your code, you've told ggplot that the columns of interest for all geoms are x, y, and color. You then tell it that for the point geom, you'd like all points to have size 3 (because this isn't specified within an aes call). Then, you tell it that for the label geom, you'd like to map the text label to the Samples column (which makes sense, well done) and ALSO that you'd like to map size to the 4 column (which doesn't exist). This means that you haven't actually specified a size for the text labels (because you've only given it a single value, 4) and asked it to map that value to the default scale for size. This is why a legend shows up for "size" - a legend that tells you how big a point will be if of value 4.

What I suspect you want is to move the size= outside of the aesthetic because you'd like that to be fixed across the data, and then ggplot will know that you're using it as an instruction for how large to make the points rather than the name of a column you'd like to map onto the data.

ggplot(data = pca.data, aes(x=X, y=Y,color = condition)) 
    geom_point(size = 3) 
    geom_text(aes(label = Samples), size=4)
  • Related