Home > Software engineering >  adding a line to a ggplot boxplot
adding a line to a ggplot boxplot

Time:10-15

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):

Y-AXIS X-AXIS SUBJECT
2.2796598 F1 1
0.9118639 F1 2
2.7111228 F3 3
2.7111228 F2 4
2.2796598 F4 5
2.3876401 F10 6
.... ... ...

The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.

To generate a box plot, I used ggplot like this:

plot1 <- ggplot(longdata,
         aes(x = X_axis, y = Y_axis))  
         geom_boxplot()   
         ylim(0, 12.5)  
         stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")

That results in the boxplot I have in mind. You can check out the result here if you like: boxplot

So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line

Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!

CodePudding user response:

library(ggplot2)

It is always a good idea to add a reproducible example of your data, you can always simulate what you need

set.seed(123)
simulated_data <- data.frame(
  subject = rep(1:10, each = 10),
  xaxis = rep(paste0('F', 1:10), times = 10),
  yaxis = runif(100, 0, 100)
)

In ggplot each geom can take a data argument, for your line just use a subset of your original data, limited to the subject desired.

Colors and other visula elements for the line are simple, take a look here

ggplot()  
  geom_boxplot(data = simulated_data, aes(xaxis, yaxis))  
  geom_line(
    data = simulated_data[simulated_data$subject == 1,], 
    aes(xaxis, yaxis),
    color = 'red',
    linetype = 2, 
    size = 1,
    group = 1
  )

Created on 2022-10-14 with reprex v2.0.2

CodePudding user response:

library(ggplot2)
library(dplyr)

# Simulate some data absent a reproducible example
testData <- data.frame(
  y = runif(300,0,100),
  x = as.factor(paste0("F",rep(1:10,times=30))),
  SUBJECT = as.factor(rep(1:30, each = 10))
)

# Copy your plot with my own data   ylimits
plot1 <- ggplot(testData,
                aes(x = x, y = y))  
  geom_boxplot()   
  ylim(0, 100)  
  stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")

# add the geom_line for subject 1
plot1  
  geom_line(data = filter(testData, SUBJECT == 1),
             mapping = aes(x=x, y=y, group = SUBJECT))

My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

  • Related