I'm using ggplot geom_vline in combination with a custom function to plot certain values on top of a histogram.
The example function below e.g. returns a vector of three values (the mean and x sds below or above the mean). I can now plot these values in geom_vline(xintercept) and see them in my graph.
#example function
sds_around_the_mean <- function(x, multiplier = 1) {
mean <- mean(x, na.rm = TRUE)
sd <- sd(x, na.rm = TRUE)
tibble(low = mean - multiplier * sd,
mean = mean,
high = mean multiplier * sd) %>%
pivot_longer(cols = everything()) %>%
pull(value)
}
Reproducible data
#data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))
df <- bind_rows(lst(normal, outliers), .id = "type")
df %>%
ggplot(aes(x = data))
geom_histogram(bins = 100)
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
linetype = "dashed", color = "red")
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
linetype = "dashed")
The problem is, that as you can see I would have to define data$df at various places. This becomes more error-prone when I apply any change to the original df that I pipe into ggplot, e.g. filtering out outliers before plotting. I would have to apply the same changes again at multiple places.
E.g.
df %>% filter(type == "normal")
#also requires
df$data
#to be changed to
df$data[df$type == "normal"]
#in geom_vline to obtain the correct input values for the xintercept.
So instead, how could I replace the df$data argument with the respective column of whatever has been piped into ggplot() in the first place? Something similar to the "." operator, I assume. I've also tried stat_summary with geom = "vline" to achieve this, but without the desired effect.
CodePudding user response:
You can enclose the ggplot part in curly brackets and reference the incoming dataset with the .
symbol both in the ggplot command and when calculating the sds_around_the_mean. This will make it dynamic.
df %>%
{ggplot(data = ., aes(x = data))
geom_histogram(bins = 100)
geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
linetype = "dashed", color = "red")
geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
linetype = "dashed")}