Is there a way to group_by variables in one column and get the ratio of another variables in a secon-CodePudding

I have a dataset with a group column and another column cyl which only has two values. I would like to calculate the proportion of one of the values from cyl for each of the groups in group.

I can do this in multiple steps involving creating new datasets and using full_join however I am wondering if there is a more efficient way to do this as the dataset I work with are large.

library(dplyr)

dat <- mtcars %>% filter(cyl >=6) 
dat$group <- seq(1:3)

cyl_6 <- dat %>% filter(cyl == 6) %>% group_by(group) %>% 
  summarise(count_6 = n())
cyl_8 <- dat %>% filter(cyl == 8) %>% group_by(group) %>% 
  summarise(count_8 =n())

cyl_data <- cyl_6 %>% full_join(cyl_8) %>% 
                  mutate(six_cyl_prop = count_6/(count_6   count_8))

cyl_data

group six_cyl_prop
    1        0.286
    2        0.571
    3        0.143

CodePudding user response：

The fundamental way to count the number of rows meeting a condition is to sum() the condition - TRUE counts as 1, and FALSE counts has 0. Similarly, to get a proportion of rows that meet a condition you can take the mean() of that condition:

dat %>%
  group_by(group) %>%
  summarize(six_cyl_prop = mean(cyl == 6))
# # A tibble: 3 × 2
# group six_cyl_prop
# <int>        <dbl>
# 1     1        0.286
# 2     2        0.571
# 3     3        0.143