I have a dataset with a group
column and another column cyl
which only has two values. I would like to calculate the proportion of one of the values from cyl
for each of the groups in group
.
I can do this in multiple steps involving creating new datasets and using full_join
however I am wondering if there is a more efficient way to do this as the dataset I work with are large.
library(dplyr)
dat <- mtcars %>% filter(cyl >=6)
dat$group <- seq(1:3)
cyl_6 <- dat %>% filter(cyl == 6) %>% group_by(group) %>%
summarise(count_6 = n())
cyl_8 <- dat %>% filter(cyl == 8) %>% group_by(group) %>%
summarise(count_8 =n())
cyl_data <- cyl_6 %>% full_join(cyl_8) %>%
mutate(six_cyl_prop = count_6/(count_6 count_8))
cyl_data
group six_cyl_prop
1 0.286
2 0.571
3 0.143
CodePudding user response:
The fundamental way to count the number of rows meeting a condition is to sum()
the condition - TRUE
counts as 1, and FALSE
counts has 0. Similarly, to get a proportion of rows that meet a condition you can take the mean()
of that condition:
dat %>%
group_by(group) %>%
summarize(six_cyl_prop = mean(cyl == 6))
# # A tibble: 3 × 2
# group six_cyl_prop
# <int> <dbl>
# 1 1 0.286
# 2 2 0.571
# 3 3 0.143