I want to define a custom function which groups and summarises some data using dplyr, and conditional on a Boolean flag can group by an additional level. I can achieve this using a full if... else control block as in this trivial example:
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(by_age = FALSE) {
if (by_age) {
bar <- Titanic %>%
group_by(Survived, Age)
} else {
bar <- Titanic %>%
group_by(Survived)
}
bar %>%
summarise(n = sum(n))
}
foo()
foo(by_age = TRUE)
But this seems a very clumsy way round. Is there a way I can achieve this with a single block of dplyr code, conditionally calling Age as a second grouping variable? I've tried with ifelse(by_age, Age, NA)
in my group_by
statement, and some of the techniques listed in this SO post but to no avail.
CodePudding user response:
Edit
Sorry, I didn't read your linked SO post; if you want to avoid the ...
approach for some reason, this is one potential solution:
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(by_age = FALSE) {
Titanic %>%
group_by(Survived, if(by_age) Age) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived `if (by_age) Age` n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
Created on 2022-07-07 by the reprex package (v2.0.1)
To avoid the "Age" column being called "if (by_age) Age" you can use:
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(by_age = FALSE) {
Titanic %>%
group_by(Survived, !!sym(ifelse(by_age, "Age", ""))) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived Age n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
Created on 2022-07-07 by the reprex package (v2.0.1)
Original answer
One solution is to use ...
(dot-dot-dot) to pass in the argument if/when you want, e.g.
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(...) {
Titanic %>%
group_by(Survived, ...) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(Age)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived Age n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
# You can also pass in multiple 'extra' arguments
foo(Age, Sex)
#> `summarise()` has grouped output by 'Survived', 'Age'. You can override using
#> the `.groups` argument.
#> # A tibble: 8 × 4
#> # Groups: Survived, Age [4]
#> Survived Age Sex n
#> <chr> <chr> <chr> <dbl>
#> 1 No Adult Female 109
#> 2 No Adult Male 1329
#> 3 No Child Female 17
#> 4 No Child Male 35
#> 5 Yes Adult Female 316
#> 6 Yes Adult Male 338
#> 7 Yes Child Female 28
#> 8 Yes Child Male 29
Created on 2022-07-07 by the reprex package (v2.0.1)
NB: Using ...
comes with two downsides:
- When you use it to pass arguments to another function, you have to carefully explain to the user where those arguments go. This makes it hard to understand what you can do with functions like lapply() and plot().
- A misspelled argument will not raise an error. This makes it easy for typos to go unnoticed (from Advanced R; https://adv-r.hadley.nz/functions.html?q=...#fun-dot-dot-dot)
CodePudding user response:
You can do using curly-curly
({{}}
) from rlang package and pass the additional group variable as NULL
library(dplyr)
library(rlang)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(grp = NULL) {
Titanic %>%
group_by(Survived, {{grp}}) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(Age)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived Age n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
Created on 2022-07-07 by the reprex package (v2.0.1)
CodePudding user response:
One approach is to split the group_by
into two group_by
statements.
foo <- function(by_age = FALSE) {
Titanic %>%
group_by(Survived) %>%
{ if (by_age) group_by(., Age, .add = TRUE) else . } %>%
summarise(n = sum(n), .groups = "drop")
}
giving:
foo()
## # A tibble: 2 x 2
## Survived n
## <chr> <dbl>
## 1 No 1490
## 2 Yes 711
foo(TRUE)
## # A tibble: 4 x 3
## Survived Age n
## <chr> <chr> <dbl>
## 1 No Adult 1438
## 2 No Child 52
## 3 Yes Adult 654
## 4 Yes Child 57