I have a dataset with two groups - Experimental and Control. Each participant contributes two responses per group, which represent different learning styles. These are represented in the box plots with jitter below. I would like to connect each participant's two responses together with lines using ggplot (so each red line in the Control group would correspond to each turquoise line in the control group), however I can't figure out how to do this within the conditions. Can someone please help? I am new to R and really need guidance.
Then, I need to change the color of the lines within the conditions to black if Increase = TRUE and red if Increase = FALSE.
Ideally, I need it to look like Jon's example here, but with black or red lines based on True or False: Connecting grouped points with lines in ggplot
The data and ggplot looks like this:
d <- data.frame (
Subject = c("1", "2", "3", "4"),
Group = c("Exp", "Exp", "Control", "Control"),
Tr = c("14", "11", "4", "23"),
Sr = c("56", "78", "12", "10"),
Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)
# put the data in long format
d <- d %>%
gather(key = "Strategy", value = "raw", Tr, Sr)
d %>%
ggplot(aes(x = Group, y = raw, color = Strategy))
geom_boxplot(width = 0.5, lwd = 0.5)
geom_jitter(width = 0.15)
geom_line(aes(group = raw),
color = "grey",
arrow = arrow(type = "closed",
length = unit(0.075, "inches")))
CodePudding user response:
Inspired from the answer you linked to - @Jon's answer
There are a few key things to understand the solution
- Since you need points and lines to be connected, you need them both to apply the exact same random jitter or it is best to jitter the data before it goes into plotting which is what I did.
- Since the variable to apply jitter on is not a number, it is helpful to note that R plots the character vector
Group
as a factor, interpreted as numbers 1,2,3,.. corresponding to the factor levels. Hence we create a numeric vector group_jit with values around 1 and 2, with offsets based on the colouring variableStrategy
to shift slightly left and right around 1 and 2. - Since you have two independent colour scales going on, it is best to have the Groups represented as
fill
and the lines represented ascolour
to avoid a single legend with 4 things on it.
Here's the code -
library(tidyverse)
# Load data
d <- data.frame (
Subject = c("1", "2", "3", "4"),
Group = c("Exp", "Exp", "Control", "Control"),
Tr = c("14", "11", "4", "23"),
Sr = c("56", "78", "12", "10"),
Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)
width_jitter <- 0.2 # 1 means full width between points
# put the data in long format
d_jit <- d %>%
gather(key = "Strategy", value = "raw", Tr, Sr) %>%
# type conversions
mutate(across(c(Group, Strategy), as_factor)) %>% # convert to factors
mutate(raw = as.numeric(raw)) %>% # make raw as numbers
# position on x axis is based on combination of Group and jittered Strategy. Mix to taste.
mutate(group_jit = as.numeric(Group) jitter(as.numeric(Strategy) - 1.5) * width_jitter * 2,
grouping = interaction(Subject, Strategy))
# plotting
d_jit %>%
ggplot(aes(x = Group, y = raw, fill = Strategy))
geom_boxplot(width = 0.5, lwd = 0.5, alpha = 0.05, show.legend = FALSE)
geom_point(aes(x = group_jit), size = 3, shape = 21)
geom_line(aes(x = group_jit,
group = Subject,
colour = Increase),
alpha = 0.5,
arrow = arrow(type = "closed",
length = unit(0.075, "inches"))
)
scale_colour_manual(values = c('red', 'black'))
Created on 2022-05-14 by the reprex package (v2.0.1)
For completeness sake, a different, and more elegant way to to do the jitter is to give a position
argument to the geom_point
and geom_line
commands a function which adds the random jitter like this (source: @erocoar's answer)
position = ggplot2::position_jitterdodge(dodge.width = 0.75, jitter.width = 0.3, seed = 1)
This way the data itself is not changed and the plotting takes care of the jittering details
jitterdodge
does the dodge (shift for the x axis variable) and jitter (small noise for the coloured points)- The
seed
argument here is key since it ensures that the same random values are returned for the point and the line functions that call it independently