Home > Software engineering >  How can I overlay a boxplot with a reference line
How can I overlay a boxplot with a reference line

Time:11-28

This is a question about ggplot. The context is data from bootstrapped resamples to be compared with a hypothetical distribution. After box-plotting the bootstrapped data, I would like to overlay a line of expected proportions. The ggplot code below produces:

Error: Aesthetics must be either length 1 or the same as the data (20): y

boot1 <- data.table(digit = 1, prop = runif(10, 0.25, 0.35))
boot2 <- data.table(digit = 2, prop = runif(10, 0.12, 0.25))
boots <- rbindlist(list(boot1, boot2))

ggplot(boots, aes(x = as.factor(digit), y = prop))   
geom_boxplot()   
geom_line(aes(x = as.factor(digit), y = c(0.3, 0.17)))

In a realistic example, the y values of the line plot would use the values produced by a non-linear function.

Thank you for your attention.

CodePudding user response:

For your example you can try geom_segment() because you don't have a continuous line, but rather segments. So each of your factors will be encoded 1,2,3 on the x-axis, if you have 3 categories, then you need to create a date frame with digit = 1:3 :

mean_data = data.frame(digit = 1:2,prop = c(0.3,0.17))

ggplot(boots, aes(x = factor(digit), y = prop))   
geom_boxplot()   
geom_segment(data = mean_data,
aes(x = digit - 0.3,xend = digit   0.3,y=prop,yend=prop),col="blue")

enter image description here

CodePudding user response:

As another spin on the segmentation approach, I tried geom_curve with intervals equal to my x-axis categories.

  geom_curve(x = 1, y = 0.3, xend = 2, yend = 0.17, curvature = 0.1, color = 2)

and the result is

enter image description here

Its not elegant, particularly with multiple categories. Thank you @StupidWolf for the assistance.

  • Related