I am trying to group by a column that is dynamically generated inside previous steps in my function. The column will have a pattern (suffix). I could use dplyr::contains()
, but it is not the most accurate way, since I only have one column and could potentially create nasty bugs if other columns with suffix appear in the future.
fun <- function(df, target){
# previous steps generate a dynamic columns
# dynamic columns have suffixes
# let's assume I only care about Length
df %>%
group_by("{target}.Length")
# do more stuff
}
I am using dplyr == 1.0.8
and thought that I this type of thing was supported. I am probably missing something really simple, and "Petal"
does not get pasted properly and interpreted as a proper column name. I tried several combinations of {}
and {{}}
and it is not working as I would expect. For example:
> fun(iris, "Petal") %>% head()
# A tibble: 6 × 6
# Groups: "{target}.Length" [1]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species `"{target}.Length"`
<dbl> <dbl> <dbl> <dbl> <fct> <chr>
1 5.1 3.5 1.4 0.2 setosa {target}.Length
2 4.9 3 1.4 0.2 setosa {target}.Length
3 4.7 3.2 1.3 0.2 setosa {target}.Length
4 4.6 3.1 1.5 0.2 setosa {target}.Length
5 5 3.6 1.4 0.2 setosa {target}.Length
6 5.4 3.9 1.7 0.4 setosa {target}.Length
I am using the same type of notation for mutate()
successfully, but it doesn't for group_by()
.
CodePudding user response:
While I use this kind of glue
pattern quite often for assignments inside mutate
and summarise
I haven't seen any examples of using it in other dplyr
verbs. However, one option would be use the .data
pronoun and paste0
which is not that elegant but works:
library(dplyr, warn=FALSE)
fun <- function(df, target){
df %>%
group_by(.data[[paste0(target, ".Length")]])
}
fun(iris, "Petal") %>% head()
#> # A tibble: 6 × 5
#> # Groups: Petal.Length [4]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
CodePudding user response:
We can use !! sym(...)
on a string to evaluate it early as object (name).
We can also use the .data
pronoun, but when using it inside a package you should import it to avoid R CMD check
notes (see here).
library(dplyr)
fun <- function(df, target){
df %>%
group_by(!! sym(paste0(target,".Length")))
}
fun(iris, "Petal") %>% head()
#> # A tibble: 6 x 5
#> # Groups: Petal.Length [4]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
Why does this work?
Lets have a look at it step by step when building our function above:
fun1 <- function(df){
df %>%
group_by("Sepal.Length")
}
⬇️ groups by new variable which is named after its value / call
iris %>%
fun1()
#> # A tibble: 150 x 6
#> # Groups: "Sepal.Length" [1]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species `"Sepal.Length"`
#> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa Sepal.Length
#> 2 4.9 3 1.4 0.2 setosa Sepal.Length
#> 3 4.7 3.2 1.3 0.2 setosa Sepal.Length
#> 4 4.6 3.1 1.5 0.2 setosa Sepal.Length
#> 5 5 3.6 1.4 0.2 setosa Sepal.Length
#> 6 5.4 3.9 1.7 0.4 setosa Sepal.Length
#> 7 4.6 3.4 1.4 0.3 setosa Sepal.Length
#> 8 5 3.4 1.5 0.2 setosa Sepal.Length
#> 9 4.4 2.9 1.4 0.2 setosa Sepal.Length
#> 10 4.9 3.1 1.5 0.1 setosa Sepal.Length
#> # ... with 140 more rows
fun2 <- function(df){
df %>%
group_by(as.name("Sepal.Length"))
}
⬇️ error: group_by needs a vetor as input, but has received a name
/ symbol
iris %>%
fun2()
#> Error in `add_computed_columns()`:
#> ! Problem adding computed columns in `group_by()`.
#> x Problem with `mutate()` input `..1`.
#> i `..1 = as.name("Sepal.Length")`.
#> x `..1` must be a vector, not a symbol.
#> Caused by error in `mutate_cols()`:
#> ! Problem with `mutate()` input `..1`.
#> i `..1 = as.name("Sepal.Length")`.
#> x `..1` must be a vector, not a symbol.
fun3 <- function(df){
df %>%
group_by(eval(as.name("Sepal.Length")))
}
⬇️ works, but creates new column which is named after call to eval
:
iris %>%
fun3()
#> # A tibble: 150 x 6
#> # Groups: eval(as.name("Sepal.Length")) [35]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species `eval(as.name("Sep~
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.1
#> 2 4.9 3 1.4 0.2 setosa 4.9
#> 3 4.7 3.2 1.3 0.2 setosa 4.7
#> 4 4.6 3.1 1.5 0.2 setosa 4.6
#> 5 5 3.6 1.4 0.2 setosa 5
#> 6 5.4 3.9 1.7 0.4 setosa 5.4
#> 7 4.6 3.4 1.4 0.3 setosa 4.6
#> 8 5 3.4 1.5 0.2 setosa 5
#> 9 4.4 2.9 1.4 0.2 setosa 4.4
#> 10 4.9 3.1 1.5 0.1 setosa 4.9
#> # ... with 140 more rows
fun4 <- function(df){
df %>%
group_by(!! as.name("Sepal.Length"))
}
⬇️ works: !!
does the same as eval
, but works only inside rlang
powered functions and it is evaluated faster than the normal eval
. For this reason group_by
correctly gets the name of the column to evaluate.
iris %>%
fun4()
#> # A tibble: 150 x 5
#> # Groups: Sepal.Length [35]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
fun5 <- function(df){
df %>%
group_by(!! sym("Sepal.Length"))
}
works: sym
is almost simmilar to base R’s as.name
, but it's shorter.
iris %>%
fun5()
#> # A tibble: 150 x 5
#> # Groups: Sepal.Length [35]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
Created on 2022-10-06 by the reprex package (v2.0.1)