Home > other >  How to show in the same legend data points (geom_point) and lines (from stat_ecdf)?
How to show in the same legend data points (geom_point) and lines (from stat_ecdf)?

Time:02-01

After estimating the parameters of a distribution through bootstrapping, starting from "known" data (i.e., coming from the literature, experiments, and so on) for wihch the teoretical distribution is known, I want to display on the same plot the original data (with associated cumulative probability), and the CDFs of both the "true" distribution and the estimated one.

However, I am failing to display data and curves in the same legend.

Minimal reproducible example:

  libary(ggplot2)

  n.studies <- 50
  cumulative.prob <- seq(1 / n.studies, 1 - 1 / n.studies, length.out = n.studies)
  known.values <- qnorm(cumulative.prob)
  group <- rep(c("Source 1", "Source 2", "Source 3"), length.out = n.studies)
  data <- data.frame(x = known.values, y = cumulative.prob, group = group)
  # Simulate the bootstrap procedure
  sampler <- function(x) {
    bootstrap <- sample(x = known.values, size = length(known.values), replace = T)
    return(c(mean = mean(bootstrap), sd = sd(bootstrap)))
  }
  simulated.statistics <- as.data.frame(do.call(rbind, lapply(rep(1, 10000), sampler)))
  simulated.dist <- data.frame("Simulated" = rnorm(10000, mean = mean(simulated.statistics$mean), sd = mean(simulated.statistics$sd)),
                               "LCL" = rnorm(10000, mean = quantile(simulated.statistics$mean, .025, names = F), sd = mean(simulated.statistics$sd)),
                               "UCL" = rnorm(10000, mean = quantile(simulated.statistics$mean, .975, names = F), sd = mean(simulated.statistics$sd)),
                               "Original" = rnorm(10000))
  # Plot data and CDFs
  g <- ggplot()  
    stat_ecdf(data = simulated.dist, aes(x = Simulated), colour = "darkgreen", linetype = "solid", geom = "step", lwd = 1)   
    stat_ecdf(data = simulated.dist, aes(x = LCL), colour = "lightgreen", linetype = "dashed", geom = "step", lwd = 1)   
    stat_ecdf(data = simulated.dist, aes(x = UCL), colour = "lightgreen", linetype = "dashed", geom = "step", lwd = 1)   
    stat_ecdf(data = simulated.dist, aes(x = Original), colour = "red", linetype = "dotted", geom = "step", lwd = 1)   
    geom_point(data = data, aes(x = x, y = y, colour = group, shape = group))  
    xlab("Value")   
    ylab("CDF")  
    theme_bw()   
    theme(legend.position = "bottom",
          legend.title = element_blank())
  print(g)

resulting chart

What I want to obtain is exactly this graph, with a legend displaying in addition to the data points (exactly as they are) the four curves (i.e. two "dashed lightgreen" curves labeled as "LCL" and "UCL", a "solid darkgreen" curve labeled "Simulated", and a "red dotted" curve labeled "Original"). So far I tried to manually set legends for colors, shapes and linetypes, but I am either obtaining "more than one legend" or an incorrect graph (e.g., points or curves are disappearing), I suppose this is due to the fact that stat_ecdf and geom_point do not share the same aesthetics.

Thank you in advance for your precious help

CodePudding user response:

I couldn't find an elegant solution focused on enter image description here

  •  Tags:  
  • Related