I have been breaking my head on this one, but have not succeeded to figure it out yet.
What I am trying to do is quite simple; Instead of placing percentages on the chart totaling 100%, I would like to put percentages totaling to 100% PER GROUP. So e.g. all_other should sum up to 100% and the other groups as well.
Is there any easy way to do this, or would you have to calculate these percentages and add them as a separate column?
The chart shown above is coded this way;
loans %>%
count(purpose = factor(purpose), not.fully.paid = factor(not.fully.paid)) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=purpose,y=pct, fill =not.fully.paid,label = scales::percent(pct))) geom_col(position="dodge") scale_y_continuous(labels=scales::percent) geom_text(position = position_dodge(width = .9),vjust = -0.5,size = 3)
This way, I tried to get a table showing the numbers as shown in the graph above. However, I still needed to convert the count to a percentage of that specific group and here I got stuck yet again :/ e.g. 1944/(1944 387) as shown below. I am not sure if this substep is the right approach. Hope someone can help!
loans$Count <- 1
df <- loans %>% group_by(purpose,not.fully.paid)
df <- df %>% summarise(sum(Count))`
purpose not.fully.paid `sum(Count)`
<chr> <fct> <dbl>
1 all_other 0 1944
2 all_other 1 387
3 credit_card 0 1116
4 credit_card 1 146
5 debt_consolidation 0 3354
6 debt_consolidation 1 603
CodePudding user response:
You could get your percentages using ... %>% group_by(purpose) %>% mutate(pct = n / sum(n))
after count
:
library(dplyr)
library(ggplot2)
loans_count %>%
group_by(purpose) %>%
mutate(pct = n / sum(n)) %>%
ggplot(aes(x = purpose, y = pct, fill = not.fully.paid, label = scales::percent(pct)))
geom_col(position = "dodge")
scale_y_continuous(labels = scales::percent)
geom_text(position = position_dodge(width = .9), vjust = -0.5, size = 3)
DATA
structure(list(purpose = c("all_other", "all_other", "credit_card",
"credit_card", "debt_consolidation", "debt_consolidation"), not.fully.paid = structure(c(1L,
2L, 1L, 2L, 1L, 2L), levels = c("0", "1"), class = "factor"),
n = c(1944L, 387L, 1116L, 146L, 3354L, 603L)), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")