Home > OS >  dplyr summarise on group not flattening data
dplyr summarise on group not flattening data

Time:05-17

I have a dataset:

df <- structure(list(ID = c(101188, 101192, 101193, 101196, 101198, 
101202, 101203, 101206, 101211, 101212, 101216, 101219, 101220, 
101222, 101223, 101224, 101226, 101227, 101228, 101229), LA = c("Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham"), EstablishmentGroup = c("Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools")), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))

If I run the following code I expect the final summarise to flatten the data and tell me

df %>%
  group_by(LA) %>%
  mutate(All_schools = n()) %>%
  ungroup() %>%
  group_by(LA, EstablishmentGroup, All_schools) %>%
  summarise(total = n(),
            per = total/All_schools)

Barking and Dagenham Local authority maintained schools 20 20 1

But it gives me 20 rows instead. I could use a distinct, but not sure what I've done wrong.

CodePudding user response:

You can summarise the count first, then mutate to calculate the percentage.

df %>%
  group_by(LA) %>%
  mutate(All_schools = n()) %>%
  ungroup() %>% 
  group_by(LA, EstablishmentGroup, All_schools) %>% 
  summarise(total = n()) %>% 
  mutate(per = total/All_schools)

Output:

# A tibble: 1 x 5
# Groups:   LA, EstablishmentGroup [1]
  LA                   EstablishmentGroup                 All_schools total   per
  <chr>                <chr>                                    <int> <int> <dbl>
1 Barking and Dagenham Local authority maintained schools          20    20     1
  • Related