Home > Mobile >  How do I group these variables to make a grouped summary using dplyr?
How do I group these variables to make a grouped summary using dplyr?

Time:09-27

This is my dput:

structure(list(Students = c(300L, 1600L, 100L, 90L, 2000L, 200L, 
300L, 340L, 1500L, 500L, 360L, 820L, 150L, 1380L, NA, 360L, 400L, 
1000L, 1600L, 142L, 250L, 2000L), Students_Primary = c(150L, 
NA, 100L, 90L, 800L, NA, NA, 150L, NA, 250L, 220L, 400L, NA, 
750L, NA, NA, NA, 600L, NA, 142L, NA, 500L), Chinese_Spoken = c("Mandarin", 
"Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", 
"Mandarin", "Mandarin", "Mandarin", "Cantonese", "Mandarin", 
"Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", "Mandarin", 
"Mandarin", "Both", "Mandarin", "Both"), Chinese_Written = c("Simplified", 
"Traditional", "Simplified", "Traditional", "Both", "Traditional", 
"Traditional", "Simplified", "Simplified", NA, "Traditional", 
"Both", NA, "Both", "Both", "Simplified", "Both", "Traditional", 
"Traditional", "Traditional", "Simplified", "Both")), class = "data.frame", row.names = c(NA, 
-22L))

I'm trying to get a summary of how many students use different Chinese writing, so I tried to do so using this code:

school %>% 
  select(Chinese_Written, Students) %>%
  group_by(Chinese_Written) %>% 
  arrange(Chinese_Written) %>% 
  na.omit()

It spits out this:

   Chinese_Written Students
   <chr>              <int>
 1 Both                2000
 2 Both                 820
 3 Both                1380
 4 Both                 400
 5 Both                2000
 6 Simplified           300
 7 Simplified           100
 8 Simplified           340
 9 Simplified          1500
10 Simplified           360
11 Simplified           250
12 Traditional         1600
13 Traditional           90
14 Traditional          200
15 Traditional          300
16 Traditional          360
17 Traditional         1000
18 Traditional         1600
19 Traditional          142

Is there some reason they're not being grouped together? I want all of the "Both", "Simplified", and "Traditional" to be summed in one group each.

CodePudding user response:

group_by alone does not do anything, it makes commands below be grouped by. So you can use summarise after to sum the variable Students by Chinese_Written

library(dplyr)

school %>% 
  group_by(Chinese_Written) %>% 
  summarise(Students = sum(Students,na.rm = TRUE))

# A tibble: 4 x 2
  Chinese_Written Students
  <chr>              <int>
1 Both                6600
2 Simplified          2850
3 Traditional         5292
4 NA                   650
  • Related