Home > Mobile >  Graph proportion by category in ggplot2
Graph proportion by category in ggplot2

Time:08-24

I am trying to graph the proportion of people in Remission (which is binary 0/1) after treatment by year. I can find how to graph the count, but I would like the proportion as there are a different number of people each year.

My data look something like this:

Client_id Year Remission
2 2016 0
4 2017 1
7 2017 0
8 2016 1
12 2016 1

I would like to create a plot with Year on the x-axis and the proportion of those in remission on the y-axis. Ideally, I would be able to do this both using geom_bar and geom_line.

I have tried this code, but it gives a proportion of 1.00 for every year, which is not correct.

ggplot(data=df) 
  geom_bar(aes(x=Year,y=Remission),stat="identity",position="dodge")

I could calculate this manually for each year and create a table using Excel, but hoping for a way to complete it in ggplot2.

CodePudding user response:

You could use position = "fill" in your geom_bar and use fill = Remission in your ggplot aesthetics like this:

library(dplyr)
library(ggplot2)
df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  ggplot(aes(x=Year, fill = Remission))  
  geom_bar(position = "fill")  
  labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2

Percentage scale

If you want a percentage scale, you can use the package scales with function percent_format() in scale_y_continuous like this:

library(dplyr)
library(ggplot2)
library(scales)
df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  ggplot(aes(x=Year, fill = Remission))  
  geom_bar(position = "fill")  
  scale_y_continuous(labels=percent_format())  
  labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2

Proportion with geom_line

You can do this by first calculating the proportion using count and group_by with a mutate and plot the data like this:

library(dplyr)
library(ggplot2)
df %>%
  mutate(Year = as.numeric(Year),
         Remission = as.factor(Remission)) %>%
  count(Year, Remission) %>%
  group_by(Year) %>%
  mutate(prop = n/sum(n)) %>%
  ungroup() %>%
  ggplot(aes(x=Year, y = prop, color = Remission))  
  geom_line()  
  scale_x_continuous(breaks = c(2016,2017))  
  labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2

  • Related