I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS | X-AXIS | SUBJECT |
---|---|---|
2.2796598 | F1 | 1 |
0.9118639 | F1 | 2 |
2.7111228 | F3 | 3 |
2.7111228 | F2 | 4 |
2.2796598 | F4 | 5 |
2.3876401 | F10 | 6 |
.... | ... | ... |
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis))
geom_boxplot()
ylim(0, 12.5)
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
CodePudding user response:
library(ggplot2)
It is always a good idea to add a reproducible example of your data, you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data
argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot()
geom_boxplot(data = simulated_data, aes(xaxis, yaxis))
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
CodePudding user response:
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y))
geom_boxplot()
ylim(0, 100)
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line
much more apparent - I'd follow his example if I were you!