I am using 'flights' data set from 'nycflights13' package and 'ggplot2' package to convert the code using stat_summary
function into the one using geom_ribbon()
, geom_line()
, and geom_point()
functions. Here is the original code:
flights %>% select(hour, dep_delay, arr_delay) %>% filter(hour > 4) %>%
pivot_longer(!hour) %>%
ggplot()
stat_summary(aes(hour, value, color = name),
fun = mean,
geom = "point",
size = 3)
stat_summary(aes(hour, value, color = name),
fun = mean,
geom = "line",
size = 1.1)
stat_summary(aes(hour, value, color = name),
fun.data = "mean_sdl",
fun.args = list(mult = 0.2),
geom = "ribbon",
alpha = 0.3)
theme_bw()
Below is my code:
df = flights %>%
select(hour, dep_delay, arr_delay) %>% filter(hour > 4) %>%
pivot_longer(!hour) %>% group_by(hour,name) %>%
summarise(value = mean(value, na.rm = T))
df %>% mutate(low = value - sd(value)*(0.2), high = value sd(value)*(0.2)) %>% ggplot()
geom_point(aes(hour, value, color = name), size = 3)
geom_line(aes(hour, value, color = name), size = 1.1)
geom_ribbon(aes(x = hour, ymax = high, ymin = low), alpha = 0.3)
theme_bw()
However, the plot I made is not similar to the orginal one, I know the problem lies in the geom_ribbon()
part but I don't know how to fix it. Could anyone help me? Thank you so much!
CodePudding user response:
library(nycflights13)
library(tidyverse)
f <- flights %>%
select(hour, dep_delay, arr_delay) %>%
filter(hour > 4) %>%
pivot_longer(!hour)
Replicate the calculation that stat_summary()
does internally, applying the mean_sdl
function to each hour/name combination:
fs <- (f
## partition data
%>% group_by(hour, name)
## convert value to a list-column
%>% nest()
## summarise each entry
%>% mutate(across(data, map, \(x) mean_sdl(x, mult = 0.2)))
## collapse back to a vector
%>% unnest(cols = c(data))
)
Now create the plot:
ggplot(fs)
aes(hour, y = y, ymin = ymin, ymax = ymax, color = name)
geom_point(size = 3)
geom_line(size = 1.1)
geom_ribbon(alpha = 0.3)
theme_bw()
The order of the elements affects the colours of the lines — i.e. if geom_ribbon
is last, it covers the lines with one or two layers of "black/alpha=0.3" (depending on whether the lines are overlapped by one or both confidence regions). I might recommend drawing the lines and points after you draw the ribbon, so that the colours are closer to the originally specified values/more predictable (but there's no need to do that if you like the way your plot looks).
CodePudding user response:
You need to add name
as a grouping variable. The natural way to do this is to map it to the color
aesthetic:
df %>%
mutate(low = value - sd(value)*(0.2), high = value sd(value)*(0.2)) %>%
ggplot()
geom_point(aes(hour, value, color = name), size = 3)
geom_line(aes(hour, value, color = name), size = 1.1)
geom_ribbon(aes(x = hour, ymax = high, ymin = low, color = name),
alpha = 0.3)
theme_bw()