I'm trying calculate the ntile
within each group of a dataframe. If I do:
mtcars %>%
group_by(gear,carb) %>%
summarise(total = sum(wt), .groups = "keep") %>%
mutate(rank = ntile(total,5))
All entries within the rank
column are equal to 1. What am I doing wrong here?
CodePudding user response:
Because when you group_by(gear, carb)
, unique combinations of these two variables are treated as a group. Since you used summrise(..., .groups = "keep")
, all grouping variables in the input are preserved. In this case, there's only one unique combinations for these two columns, and therefore, every row would be in it's own group (note # Groups: gear, carb [11]
in the tibble
output). Therefore, you are calculating ntile
of one element for every group, and the result will of course be 1.
If you don't include the .groups = "keep"
argument, the last grouping variable will be dropped (carb
will be dropped), and you can see rank per gear
(note # Groups: gear [3]
).
A little more information in the documentation (?dplyr::summarise
):
When .groups is not specified, it is chosen based on the number of rows of the results:
- If all the results have 1 row, you get "drop_last".
- If the number of rows varies, you get "keep".
library(dplyr)
mtcars %>%
group_by(gear,carb) %>%
summarise(total = sum(wt)) %>%
mutate(rank = ntile(total, 5))
# A tibble: 11 × 4
# Groups: gear [3]
gear carb total rank
<dbl> <dbl> <dbl> <int>
1 3 1 9.14 1
2 3 2 14.2 3
3 3 3 11.6 2
4 3 4 23.4 4
5 4 1 8.29 1
6 4 2 10.7 2
7 4 4 12.4 3
8 5 2 3.65 4
9 5 4 3.17 2
10 5 6 2.77 1
11 5 8 3.57 3