Home > Net >  How to do summarize group by category and count of a subgroup in dplyr
How to do summarize group by category and count of a subgroup in dplyr

Time:04-28

Using the titanic built-in dataset, I currently have a count of the number of observations in the variable Class. How can I create a new column with the count of Survive = 'Yes' and Survive = 'No'.

> as.data.frame(Titanic) %>% 
      mutate_if(is.character, as.factor) %>% 
      group_by(Class) %>%
      summarise("Number of Observations" = n() )

# A tibble: 4 × 2
  Class `Number of Observations`
  <fct>                    <int>
1 1st                          8
2 2nd                          8
3 3rd                          8
4 Crew                         8

I am hoping to get something like this

# A tibble: 4 × 2
  Class `Number of Observations`   Survived.Yes   Survived.No
  <fct>                    <int>
1 1st                          8      4              4
2 2nd                          8      4              4
3 3rd                          8      4              4
4 Crew                         8      4              4

I have tried putting Survived in the group by statement but it outputs into a separate row.

as.data.frame(Titanic) %>% 
  mutate_if(is.character, as.factor) %>% 
  group_by(Class, Survived) %>%
  summarise("Number of Observations" = n() )

# A tibble: 8 × 3
# Groups:   Class [4]
  Class Survived `Number of Observations`
  <fct> <fct>                       <int>
1 1st   No                              4
2 1st   Yes                             4
3 2nd   No                              4
4 2nd   Yes                             4
5 3rd   No                              4
6 3rd   Yes                             4
7 Crew  No                              4
8 Crew  Yes                             4

Any advice is appreciated. Thank you

CodePudding user response:

You can use sum(Survived == "Yes") to get the count of "Yes" in each group.

as.data.frame(Titanic) %>% 
  group_by(Class) %>%
  summarise(
    "Number of Observations" = n(),
    across(Survived, list(Yes = ~ sum(. == "Yes"),
                          No  = ~ sum(. == "No"))))

# # A tibble: 4 x 4
#   Class `Number of Observations` Survived_Yes Survived_No
#   <fct>                    <int>        <int>       <int>
# 1 1st                          8            4           4
# 2 2nd                          8            4           4
# 3 3rd                          8            4           4
# 4 Crew                         8            4           4

You can also use pivot_wider() from tidyr:

library(tidyr)

as.data.frame(Titanic) %>%
  add_count(Class, name = "Number of Observations") %>%
  pivot_wider(c(Class, last_col()),
              names_from = Survived, names_prefix = "Survived_",
              values_from = Survived, values_fn = length)

# # A tibble: 4 x 4
#   Class `Number of Observations` Survived_No Survived_Yes
#   <fct>                    <int>       <int>        <int>
# 1 1st                          8           4            4
# 2 2nd                          8           4            4
# 3 3rd                          8           4            4
# 4 Crew                         8           4            4

You even don't need to attach other packages.

addmargins(xtabs(~ Class   Survived, Titanic), 2)

#       Survived
# Class  No Yes Sum
#   1st   4   4   8
#   2nd   4   4   8
#   3rd   4   4   8
#   Crew  4   4   8
  • Related