Home > Mobile >  Using dplyr function to calculate percentage within groups
Using dplyr function to calculate percentage within groups

Time:10-27

I have the following columns:

population type_n user
2 small 10
5 small 11
7 medium 12
7 medium 13
9 large 14
2 large 15
4 large 16

I would like to calculate the percentage within each group define according to "type_n" - that is small group, medium group and large group - as result of the ratio between"user" count and "population" sum. For example small group has 2 users and a population sum of 7: (2/7)*100.

I want to obtain an output like this:

type_n new_col
small 28,5
medium 14,2
large 20

Thanks in advance for any suggestion or help!

CodePudding user response:

library(dplyr)

df %>%
  # line below to freeze order of type_n if not ordered factor already
  mutate(type_n = forcats::fct_inorder(type_n)) %>%
  group_by(type_n) %>%
  summarize(n = n(), total = sum(population)) %>%
  mutate(new_col = (n / total) %>% scales::percent(decimal.mark = ",", suffix = ""))

# A tibble: 3 x 4
  type_n     n total new_col
  <fct>  <int> <int> <chr>  
1 small      2     7 28,6   
2 medium     2    14 14,3   
3 large      3    15 20,0

CodePudding user response:

Using base R, divide the table of 'type_n' with the rowsum of 'population' grouped by 'type_n' (the groups will be ordered in alphabetic order), and convert the named vector output to a two column data.frame with stack

with(df1, stack(100 * table(type_n)/rowsum(population, type_n)[,1]))[2:1]
     ind   values
1  large 20.00000
2 medium 14.28571
3  small 28.57143

data

df1 <- structure(list(population = c(2L, 5L, 7L, 7L, 9L, 2L, 4L), 
type_n = c("small", 
"small", "medium", "medium", "large", "large", "large"), user = 10:16),
 class = "data.frame", row.names = c(NA, 
-7L))
  • Related