I have data as follows:
library(data.table)
dat <- fread("total women young
1 0 0
1 1 1
1 0 1
2 1 1
2 2 1
2 2 1
3 1 2
3 2 3
3 2 3
4 4 2
4 4 3
4 3 3
5 5 2
5 2 3
5 5 3
10 4 2
10 4 3
20 5 3
100 10 20")
I would like to create six categories for the variable tot_num
:
1,2,3,4,5 and over 5.
I would like to count the observations per category total
in count
.
sum_tot
would simply be these multiplied. And women
and young
are the average amount of women and young people in that group.
Desired output
total count sum_tot_count women young
1 3 3 0.33 0.66
2 3 6 5/6 0.5
3 3 9 5/9 8/9
4 3 12 11/12 10/12
5 3 15 12/15 8/15
over 5 4 140 23/140 28/140
I am having some trouble figuring out where to start.
Could someone lead me on the right track?
CodePudding user response:
Does this work:
library(dplyr)
dat %>% mutate(tot = if_else(total > 5, 'over 5', as.character(total))) %>%
group_by(tot) %>% summarise(count = n(), sum_tot_count = sum(total), women = sum(women)/sum(total), young = sum(young)/sum(total))
# A tibble: 6 × 5
tot count sum_tot_count women young
<chr> <int> <int> <dbl> <dbl>
1 1 3 3 0.333 0.667
2 2 3 6 0.833 0.5
3 3 3 9 0.556 0.889
4 4 3 12 0.917 0.667
5 5 3 15 0.8 0.533
6 over 5 4 140 0.164 0.2
CodePudding user response:
With cut
:
dat %>%
group_by(cutGroup = cut(total, breaks = c(1:6, Inf), labels = c(1:5, "over 5"), include.lowest = TRUE, right = FALSE)) %>%
summarise(count = n(),
sum_tot_count = sum(total),
women = sum(women) / sum(total),
young = sum(young) / sum(total))
CodePudding user response:
A data.table
solution. The key is using cut()
, as in other answers; after that, basic data.table syntax as in Use data.table to count and aggregate / summarize a column will get you the rest of the way:
dat[, cat := cut(total, breaks = 0.5 c(0:5,Inf), labels = c(1:5, "over 5"))]
.(count = n())]
dat[,.(count=.N,
total = sum(total),
women = sum(women)/sum(total),
young = sum(young)/sum(total)),
by = cat]