Visualising the effect of a continuous predictor on a dichotomous outcome using ggplot2-CodePudding

My dataset has two columns. Column 1: a dichotomous variable - 'Yes' or No'. Column 2: a continuous predictor, which ranges from 3 - 6.

In base R, I can easily visualise the effect of this continuous predictor on the probability of achieving said dichotomous outcome, by simply using plot(outcome~predictor). If I do so, I get a graph that looks something like this:

I am unable to replicate this type of plot using ggplot2, nor find any examples of other people using what looks like to me a simple way to visualise the data. If anyone would be able to explain how I can produce this plot using ggplot2 I'd be most grateful.

CodePudding user response：

You could approach this using geom_rect as follows:

First, some toy data:

x <- runif(1000)
y <- rbinom(1000,1,0.2)
df <- data.frame(x,y)

Now make a new dataframe that includes the coordinates of each rectangle. You'll need to define how to break up the axis, you could do it evenly, use quantiles, whatever.. I've chosen some arbitrary values:

limits <- c(0,.3,.9,1)

Now I can find the proportion I want for each group:

df$xcut <- cut(x, c(0,.3,.9,1))
df2 <- aggregate(data=df, y~xcut, mean)
df2$max <- limits[-1]
df2$min <- limits[-(length(limits))]
df2

       xcut         y max min
1   (0,0.3] 0.2052980 0.3 0.0
2 (0.3,0.9] 0.2128378 0.9 0.3
3   (0.9,1] 0.2358491 1.0 0.9

Now you have everything you need for geom_rect

ggplot(df2)   geom_rect(aes(xmin=min,xmax=max, ymin=0, ymax=y ), fill="white", col="black")   
  labs(y="Proportion",x="x")   
  scale_x_continuous(breaks=limits)

You can tweak the y axis scale and add the 'no' boxes to get the effect you want although that seems a bit redundant.

CodePudding user response：

Here is a R base and ggplot solution. First we create some data

set.seed(1)
df <- data.frame(Predictor= round(rnorm(10000, 5, 2), 0))

df <- data.frame(Predictor= round(rnorm(10000, 5, 2), 0),
             Dichotomous_outcome= factor(sample(c("Yes", "No"), 10000, replace= TRUE)))

Then we table the binary variable for the predictor and calculate the fractions

df_table <- aggregate(Dichotomous_outcome ~ Predictor, df, table)

df_table$Yes_fraction <- df_table$Dichotomous_outcome[, "Yes"]/ rowSums(df_table$Dichotomous_outcome)
df_table$No_fraction <- df_table$Dichotomous_outcome[, "No"]/ rowSums(df_table$Dichotomous_outcome)
df_table <- df_table[order(df_table$Predictor), ]

Now we transform the dataframe so that we can make a rectangle

df_rect <- data.frame(x_min= rep(df_table$Predictor[1:(nrow(df_table)-1)], 2),
                      x_max= rep(df_table$Predictor[2:(nrow(df_table))], 2),
                      y_min= c(rep(0, nrow(df_table)-1), df_table$Yes_fraction[-1]),
                      y_max= c(df_table$Yes_fraction[-1], rep(1, nrow(df_table)-1)),
                      col= rep(c("Yes", "No"), each= nrow(df_table)-1))

Now we can plot it

library(ggplot2)
ggplot(df_rect)  
  geom_rect(aes(xmin= x_min, xmax= x_max, ymin= y_min, ymax= y_max, fill= col), col= "black")  
labs(x= "Predictor", y= "Dichotomuous Outcome")  
  scale_y_continuous(breaks= c(.25, .75), labels= c("Yes", "No"))

CodePudding user response：

Perhaps the ggmosaic package can be adapted to suit your needs? E.g.

library(tidyverse)
#install.packages("ggmosaic")
library(ggmosaic)

df <- data.frame(dichot = sample(c("Yes", "No"), 25, replace = TRUE),
                 contin = sample(1:6, 25, replace = TRUE))

ggplot(df)  
  geom_mosaic(aes(x = product(contin), fill = dichot))

^{Created on 2021-11-24 by the reprex package (v2.0.1)}