Home > Mobile >  Using a function in dplyr filter
Using a function in dplyr filter

Time:10-13

I'd like to define a helper function to help me compose some boolean filters more clearly.

This is a working example of the result using the iris dataset

library(tidyverse)


sepal_config = function(length, width, species, .data) {
  .data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}

iris %>% 
  filter(
      sepal_config(length = 4, width = 3, species = "versicolor", .data = .data) |  # 34 rows
      sepal_config(length = 3, width = 3, species = "virginica",  .data = .data)    # 21 rows
    )                                                                               # 55 rows

I want to do this without having to pass in .data, and ideally to also have the column names evaluated in the dataframe scope (i.e., avoiding this error)

sepal_config = function(length, width, species) {
  Sepal.Length > length & Sepal.Width < width & Species == species
}

iris %>% 
  filter(
      sepal_config(length = 4, width = 3, species = "versicolor") |
      sepal_config(length = 3, width = 3, species = "virginica")
    )                                                               
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `|...`.
x object 'Sepal.Length' not found

Unfortunately I don't understand NSE well enough to know if this is an option. I have tried various techniques from the programming with dplyr how-to guide, but the footnote makes me think I am looking in the wrong place.

dplyr’s filter() is inspired by base R’s subset(). subset() provides data masking, but not with tidy evaluation, so the techniques described in this chapter don’t apply to it.

Thanks, Akhil

CodePudding user response:

You can wrap the expression in your function with quo() and use the !! operator to defuse it in the filter() call.

library(dplyr)

sepal_config = function(length, width, species) {
  quo(Sepal.Length > length & Sepal.Width < width & Species == species)
  }

iris %>% 
  filter(!!sepal_config(length = 4, width = 3, species = "versicolor") |
         !!sepal_config(length = 3, width = 3, species = "virginica"))


   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.5         2.3          4.0         1.3 versicolor
2           6.5         2.8          4.6         1.5 versicolor
3           5.7         2.8          4.5         1.3 versicolor
4           4.9         2.4          3.3         1.0 versicolor
5           6.6         2.9          4.6         1.3 versicolor
6           5.2         2.7          3.9         1.4 versicolor
7           5.0         2.0          3.5         1.0 versicolor
8           6.0         2.2          4.0         1.0 versicolor
9           6.1         2.9          4.7         1.4 versicolor
10          5.6         2.9          3.6         1.3 versicolor
...

CodePudding user response:

dplyr provides a function cur_data() for this sort of thing:

library(dplyr, warn.conflicts = FALSE)

sepal_config <- function(data, length, width, species, .data = cur_data()) {
  .data$Sepal.Length > length & .data$Sepal.Width < width & .data$Species == species
}

iris %>% 
  as_tibble() %>% 
  filter(
    sepal_config(length = 4, width = 3, species = "versicolor") |  # 34 rows
      sepal_config(length = 3, width = 3, species = "virginica")    # 21 rows
  )     
#> # A tibble: 55 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          5.5         2.3          4           1.3 versicolor
#>  2          6.5         2.8          4.6         1.5 versicolor
#>  3          5.7         2.8          4.5         1.3 versicolor
#>  4          4.9         2.4          3.3         1   versicolor
#>  5          6.6         2.9          4.6         1.3 versicolor
#>  6          5.2         2.7          3.9         1.4 versicolor
#>  7          5           2            3.5         1   versicolor
#>  8          6           2.2          4           1   versicolor
#>  9          6.1         2.9          4.7         1.4 versicolor
#> 10          5.6         2.9          3.6         1.3 versicolor
#> # ... with 45 more rows

Created on 2021-10-12 by the reprex package (v2.0.0)

  • Related