Home > Mobile >  Aggregate percentage of total in bar plot
Aggregate percentage of total in bar plot

Time:12-07

I have a dataset like this (sample)

data.frame(loan_status=c('Y','N','Y','N','Y','N'), property_area=c('Semiurban', 'Urban', 'Rural', 'Semiurban', 'Urban', 'Rural'))

On which loan_status=='Y' means the loan was accepted.

I want to plot the acceptance percentage of each property_area, something like this: enter image description here

I tried to use ggplot library without success..

What's the easiest way?

CodePudding user response:

You can start with something like that. The scales library allows to easily transform labels to percentages.

library(scales); library(ggplot2); library(dplyr)

df %>% group_by(property_area) %>% 
  summarise(acceptance_pct = sum(loan_status == 'Y') / n()) %>% 
  ggplot(aes(x = acceptance_pct, y = property_area))  
  geom_col(fill = "darkgreen")  
  geom_text(aes(label = percent(acceptance_pct)), hjust = -0.1)  
  scale_x_continuous(labels = percent_format(accuracy = 1))

enter image description here

CodePudding user response:

Here's a base R solution including changing hue based on percentage. I extend your data to show different percentages, not just 50%. Also, I changed "Y" to TRUE and "N" to FALSE.

# adjust data
df1$loan_status[df1$loan_status == "Y"] <- T
df1$loan_status[df1$loan_status == "N"] <- F

# get groups
pl <- aggregate( loan_status ~ property_area, df1, function(x) 
  sum(as.logical(x))/length(x) )

# adjust plotting area
par(mar=c(14,6,14,4))
# plot data, add percentages and grid
bp <- barplot( pl$loan_status, 
               names=pl$property_area, 
               horiz=T, border=F, las=1, 
               col=rgb(0.1, 1-pl$loan_status, 0.1, 1), 
               xlim=c(0,1.14) );
text(pl$loan_status, bp, labels=paste(round(pl$loan_status*100, digits=2),"%"), pos=4);
abline(v = seq(0,1,by=0.2), col = "grey", lty = "dotted")

barplot

Data

df1 <- structure(list(loan_status = c("Y", "N", "Y", "N", "Y", "N", 
"Y", "Y", "Y", "N"), property_area = c("Semiurban", "Urban", 
"Rural", "Semiurban", "Urban", "Rural", "Semiurban", "Semiurban", 
"Semiurban", "Rural")), row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10"), class = "data.frame")

CodePudding user response:

Could you start with this?

p = ggplot(mydataf, aes(y = foo))   
    geom_bar(aes(x = (..count..)/sum(..count..)))   
    scale_x_continuous(formatter = 'percent')
  •  Tags:  
  • r
  • Related