Home > Back-end >  Overlay boxplot obtained grouping continuous variable with scatterplot of the original variables
Overlay boxplot obtained grouping continuous variable with scatterplot of the original variables

Time:02-12

I have a two continuous variables, one of which I have categorized in two groups. I want a scatterplot of the original variables over a boxplot of the categorized variable:

library(ggplot2)
data(cancer, package="survival")
lung$age2 <- lung$age >= 75

## Boxplot
ggplot(lung, aes(x=age2, y=wt.loss))  
  geom_boxplot()
#> Warning: Removed 14 rows containing non-finite values (stat_boxplot).

## Scatterplot
ggplot(lung, aes(x=age, y=wt.loss))  
  geom_point()
#> Warning: Removed 14 rows containing missing values (geom_point).

Created on 2022-02-11 by the enter image description here

CodePudding user response:

I found this post that shows how to create a boxplot with a numeric x-axis. That is an acceptable solution to my problem, although not perfect:

library(ggplot2)
data(cancer, package="survival")
lung$age2 <- ifelse(lung$age < 75, 50, 70)
ggplot(lung)  
  geom_boxplot(aes(x=age, y=wt.loss, group=age2))  
  geom_point(aes(x=age, y=wt.loss))
#> Warning: Removed 14 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 14 rows containing missing values (geom_point).

Created on 2022-02-12 by the reprex package (v2.0.1)

Compared to the optimal solution, I couldn't get the boxes at fixed positions and of the same width (it is proportional to the number of observation). The position of the boxes make sense, however, because it is sensible to put them around the middle value of the corresponding group. I'd like to reduce the box width, though, but the width option seems to have no effect.

  • Related