Home > OS >  using ntile within a group
using ntile within a group

Time:05-02

I'm trying calculate the ntile within each group of a dataframe. If I do:

mtcars %>%
  group_by(gear,carb) %>%
  summarise(total = sum(wt), .groups = "keep") %>%
  mutate(rank = ntile(total,5))

All entries within the rank column are equal to 1. What am I doing wrong here?

CodePudding user response:

Because when you group_by(gear, carb), unique combinations of these two variables are treated as a group. Since you used summrise(..., .groups = "keep"), all grouping variables in the input are preserved. In this case, there's only one unique combinations for these two columns, and therefore, every row would be in it's own group (note # Groups: gear, carb [11] in the tibble output). Therefore, you are calculating ntile of one element for every group, and the result will of course be 1.

If you don't include the .groups = "keep" argument, the last grouping variable will be dropped (carb will be dropped), and you can see rank per gear (note # Groups: gear [3]).

A little more information in the documentation (?dplyr::summarise):

When .groups is not specified, it is chosen based on the number of rows of the results:

  • If all the results have 1 row, you get "drop_last".
  • If the number of rows varies, you get "keep".
library(dplyr)

mtcars %>%
  group_by(gear,carb) %>%
  summarise(total = sum(wt)) %>%
  mutate(rank = ntile(total, 5))

# A tibble: 11 × 4
# Groups:   gear [3]
    gear  carb total  rank
   <dbl> <dbl> <dbl> <int>
 1     3     1  9.14     1
 2     3     2 14.2      3
 3     3     3 11.6      2
 4     3     4 23.4      4
 5     4     1  8.29     1
 6     4     2 10.7      2
 7     4     4 12.4      3
 8     5     2  3.65     4
 9     5     4  3.17     2
10     5     6  2.77     1
11     5     8  3.57     3
  • Related