Home > other >  R: What is the difference between dplyr::group_keys() and summarise()?
R: What is the difference between dplyr::group_keys() and summarise()?

Time:06-01

Suppose I group a data.frame() using dplyr::group_by(). Is there any scenario where passing this to group_keys() or summarise() would produce different results? Was surprised to see a group_keys function.

library(dplyr)
df <- data.frame(x = rep(1:2, 10), y = rep(1:10,2))
df_grouped <- df %>% group_by(x,y) 


# group_keys
df_grouped %>% group_keys()

# summarise
df_grouped %>% summarise()

CodePudding user response:

summarise() without arguments will strip one level of grouping, returning a grouped data frame if there are multiple grouping columns:

library(dplyr)

mtcars %>% 
  group_by(am, vs) %>% 
  summarise()
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 x 2
#> # Groups:   am [2]
#>      am    vs
#>   <dbl> <dbl>
#> 1     0     0
#> 2     0     1
#> 3     1     0
#> 4     1     1

group_keys() does not return a grouped data frame, and is more idiomatic for the task:

mtcars %>% 
  group_by(am, vs) %>% 
  group_keys()
#> # A tibble: 4 x 2
#>      am    vs
#>   <dbl> <dbl>
#> 1     0     0
#> 2     0     1
#> 3     1     0
#> 4     1     1
  • Related