Creating counts of subset with dplyr-CodePudding

I'm trying to summarize a data set with not only total counts per group, but also counts of subsets. So starting with something like this:

df <- data.frame(
  Group=c('A','A','B','B','B'),
  Size=c('Large','Large','Large','Small','Small')
)

df_summary <- df %>%
  group_by(Group) %>%
  summarize(group_n=n())

I can get a summary of the number of observations for each group:

> df_summary
# A tibble: 2 x 2
  Size  size_n
  <chr>  <int>
1 Large      3
2 Small      2

Is there anyway I can add some sort of subsetting information to n() to get, say, a count of how many observations per group were Large in this example? In other words, ending up with something like:

  Group group_n Large_n
1     A       2       2
2     B       3       1

Thank you!

CodePudding user response：

We could use count: count(xyz) is the same as group_by(xyz) %>% summarise(xyz = n())

library(dplyr)

df %>% 
  count(Group, Size)

  Group  Size n
1     A Large 2
2     B Large 1
3     B Small 2

library(dplyr)
library(tidyr)

df %>% 
  count(Group, Size) %>% 
  pivot_wider(names_from = Size, values_from = n)

  Group Large Small
  <chr> <int> <int>
1 A         2    NA
2 B         1     2

CodePudding user response：

I approach this problem using an ifelse and a sum:

df_summary <- df %>%
  group_by(Group) %>%
  summarize(group_n=n(),
            Large_n = sum(ifelse(Size == "Large", 1, 0)))

The last line turns Size into a binary indicator taking the value 1 if Size == "Large" and 0 otherwise. Summing this indicator is equivalent to counting the number of rows with "Large".

CodePudding user response：

 df_summary <- df %>%
    group_by(Group) %>%
    mutate(group_n=n())%>% 
    ungroup() %>% 
    group_by(Group,Size) %>% 
    mutate(Large_n=n()) %>% 
    ungroup() %>% 
    distinct(Group, .keep_all = T)

# A tibble: 2 x 4
  Group Size  group_n Large_n
  <chr> <chr>   <int>   <int>
1 A     Large       2       2
2 B     Large       3       1