Home > Blockchain >  How do you efficiently group by multiple columns in dplyr
How do you efficiently group by multiple columns in dplyr

Time:11-17

With dplyr you can group by columns like this:

library(dplyr)

df <- data.frame(a=c(1,2,1,3,1,4,1,5), b=c(2,3,4,1,2,3,4,5))
df %>%
  group_by(a) %>%
  summarise(count = n())

If I want to group by two columns all the guides say:

df %>%
  group_by(a,b) %>%
  summarise(count = n())

But can I not feed the group_by() parameters more efficiently somehow, rather than having to type them in explicitly, e.g. like:

cols = colnames(df)
df %>%
  group_by(cols) %>%
  summarise(count = n())

I have examples where I want to group by 10 columns, and it is pretty horrible to write it out if you can just parse their names.

CodePudding user response:

across and curly-curly is the answer (even though it doesn't make sense to group_by using all your columns)

cols = colnames(df)
df %>%
  group_by(across({{cols}}) %>%
  summarise(count = n())

CodePudding user response:

You can use across with any of the tidy selectors. For example if you want all columns

df %>%
  group_by(across(everything())) %>%
  summarise(count = n())

Of if you want a list

cols <- c("a","b")
df %>%
  group_by(across(all_of(cols))) %>%
  summarise(count = n())

See help("language", package="tidyselect") for all the selection options.

  • Related