I have a dataset:
df <- structure(list(ID = c(101188, 101192, 101193, 101196, 101198,
101202, 101203, 101206, 101211, 101212, 101216, 101219, 101220,
101222, 101223, 101224, 101226, 101227, 101228, 101229), LA = c("Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham"), EstablishmentGroup = c("Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools")), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))
If I run the following code I expect the final summarise to flatten the data and tell me
df %>%
group_by(LA) %>%
mutate(All_schools = n()) %>%
ungroup() %>%
group_by(LA, EstablishmentGroup, All_schools) %>%
summarise(total = n(),
per = total/All_schools)
Barking and Dagenham Local authority maintained schools 20 20 1
But it gives me 20 rows instead. I could use a distinct, but not sure what I've done wrong.
CodePudding user response:
You can summarise the count first, then mutate to calculate the percentage.
df %>%
group_by(LA) %>%
mutate(All_schools = n()) %>%
ungroup() %>%
group_by(LA, EstablishmentGroup, All_schools) %>%
summarise(total = n()) %>%
mutate(per = total/All_schools)
Output:
# A tibble: 1 x 5
# Groups: LA, EstablishmentGroup [1]
LA EstablishmentGroup All_schools total per
<chr> <chr> <int> <int> <dbl>
1 Barking and Dagenham Local authority maintained schools 20 20 1