I have the following columns:
population | type_n | user |
---|---|---|
2 | small | 10 |
5 | small | 11 |
7 | medium | 12 |
7 | medium | 13 |
9 | large | 14 |
2 | large | 15 |
4 | large | 16 |
I would like to calculate the percentage within each group define according to "type_n" - that is small group, medium group and large group - as result of the ratio between"user" count and "population" sum. For example small group has 2 users and a population sum of 7: (2/7)*100.
I want to obtain an output like this:
type_n | new_col |
---|---|
small | 28,5 |
medium | 14,2 |
large | 20 |
Thanks in advance for any suggestion or help!
CodePudding user response:
library(dplyr)
df %>%
# line below to freeze order of type_n if not ordered factor already
mutate(type_n = forcats::fct_inorder(type_n)) %>%
group_by(type_n) %>%
summarize(n = n(), total = sum(population)) %>%
mutate(new_col = (n / total) %>% scales::percent(decimal.mark = ",", suffix = ""))
# A tibble: 3 x 4
type_n n total new_col
<fct> <int> <int> <chr>
1 small 2 7 28,6
2 medium 2 14 14,3
3 large 3 15 20,0
CodePudding user response:
Using base R
, divide the table
of 'type_n' with the rowsum
of 'population' group
ed by 'type_n' (the groups will be ordered in alphabetic order), and convert the named vector output to a two column data.frame with stack
with(df1, stack(100 * table(type_n)/rowsum(population, type_n)[,1]))[2:1]
ind values
1 large 20.00000
2 medium 14.28571
3 small 28.57143
data
df1 <- structure(list(population = c(2L, 5L, 7L, 7L, 9L, 2L, 4L),
type_n = c("small",
"small", "medium", "medium", "large", "large", "large"), user = 10:16),
class = "data.frame", row.names = c(NA,
-7L))