Home > OS >  Calculate proportions of categories within groups
Calculate proportions of categories within groups

Time:02-20

I want to calculate the %'s of items within groups. For example, there are 2 groups and each contain 3 fruits. I want to know within each group, what are the proportions of fruit (i.e. each group should add up to 100%). I can achieve this using the below code but it feels too verbose. Can anyone suggests any improvements or a function that already exists to simplify it?

library(tidyverse)
#some data
fruit <- rep(c("apples", "oranges", "bananas"), 
             times=c(3, 2, 5))
group <- rep(c(1, 2), times=c(35, 65))
df <- data.frame(fruit, group, stringsAsFactors=FALSE)

#get %'s for each fruit within each group
df2 <- df %>%
  #get numerator
  group_by(fruit, group) %>%
  summarise(`Total by fruit and group` = n()) %>%
  #get denominator
  left_join(df %>%
              group_by(group) %>%
              summarise(`Total by group` = n())) %>%
  #work out % as numerator/denominator *100
  mutate(`%`=`Total by fruit and group` / `Total by group`*100 ) %>%
  select(fruit, group, `%`) %>%
  arrange(group)

CodePudding user response:

Use the base R code.

res <- with(df, table(fruit, group)) |> proportions(margin=2) |>
  as.data.frame.table() |> transform(Freq=Freq*100)
res
#     fruit group     Freq
# 1  apples     1 34.28571
# 2 bananas     1 42.85714
# 3 oranges     1 22.85714
# 4  apples     2 27.69231
# 5 bananas     2 53.84615
# 6 oranges     2 18.46154

"%" as column name is discouraged. Use only characters, number (not at first position) and underscore if possible, no spaces.

Note: R >= 4.1 used.


Data:

df <- structure(list(fruit = c("apples", "apples", "apples", "oranges", 
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas", 
"apples", "apples", "apples", "oranges", "oranges", "bananas", 
"bananas", "bananas", "bananas", "bananas", "apples", "apples", 
"apples", "oranges", "oranges", "bananas", "bananas", "bananas", 
"bananas", "bananas", "apples", "apples", "apples", "oranges", 
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas", 
"apples", "apples", "apples", "oranges", "oranges", "bananas", 
"bananas", "bananas", "bananas", "bananas", "apples", "apples", 
"apples", "oranges", "oranges", "bananas", "bananas", "bananas", 
"bananas", "bananas", "apples", "apples", "apples", "oranges", 
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas", 
"apples", "apples", "apples", "oranges", "oranges", "bananas", 
"bananas", "bananas", "bananas", "bananas", "apples", "apples", 
"apples", "oranges", "oranges", "bananas", "bananas", "bananas", 
"bananas", "bananas", "apples", "apples", "apples", "oranges", 
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas"
), group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
)), class = "data.frame", row.names = c(NA, -100L))

CodePudding user response:

Using dplyr you could do:

Reprex

  • Code
library(dplyr)

df %>% 
  group_by(group) %>% 
  count(fruit) %>% 
  mutate(freq = n / sum(n) * 100) %>% 
  select(-n)
  • Output
#> # A tibble: 6 x 3
#> # Groups:   group [2]
#>   group fruit    freq
#>   <dbl> <chr>   <dbl>
#> 1     1 apples   34.3
#> 2     1 bananas  42.9
#> 3     1 oranges  22.9
#> 4     2 apples   27.7
#> 5     2 bananas  53.8
#> 6     2 oranges  18.5

Created on 2022-02-19 by the reprex package (v2.0.1)

CodePudding user response:

Another possible solution, based on janitor::tabyl and janitor::adorn_percentages:

library(magrittr)
library(janitor)

df %>% 
  tabyl(group, fruit) %>% 
  adorn_percentages("row") 

#>  group    apples   bananas   oranges
#>      1 0.3428571 0.4285714 0.2285714
#>      2 0.2769231 0.5384615 0.1846154
  •  Tags:  
  • r
  • Related