Home > Mobile >  How do you filter by passing a string as a column name in a user-defined function?
How do you filter by passing a string as a column name in a user-defined function?

Time:04-15

I'm writing a function where the user specifies the column they want to filter and what cutoff value they want. In this example, I want to filter out any pretest scores under 2. Here's a sample dataset:

library(dplyr)

test <- tibble(name = c("Corey", "Justin", "Sibley", "Kate"),
               pretest_score = c(1:4),
               posttest_score = c(5:8),
               final_score = c(9:12))


filter_function <- function(data, test_type = c(pretest, posttest, final), value) {
  
  test_character <- deparse(substitute(test_type))
  test_score <- paste0(test_character, "_score")
  
  data %>%
    filter({{test_score}} > value)
  
}

filter_function(test, test_type = pretest, value = 2)

I've also tried !!test_score, test_score (with nothing around it), and ensym(test_score) from rlang, all to no avail.

Note: I know that in this example, I could just specify pretest_score, posttest_score, etc as the test type, but in my real dataset, I have many dimensions for these tests that users can determine cutoffs for (pretest_score, pretest_date, pretest_location, etc.), so it's important that I merge the column prefix with the suffix (here, _score) within the function itself.

Thank you for any help!

CodePudding user response:

Convert the character to symbol and evaluate with !!

filter_function <- function(data, test_type = c(pretest, posttest, 
      final), value) {
  
  test_character <- deparse(substitute(test_type))
  test_score <- paste0(test_character, "_score")
  
  data %>%
    filter(!! rlang::sym(test_score) > value)
  
}

-testing

> filter_function(test, test_type = pretest, value = 2)
# A tibble: 2 × 4
  name   pretest_score posttest_score final_score
  <chr>          <int>          <int>       <int>
1 Sibley             3              7          11
2 Kate               4              8          12

CodePudding user response:

A few points:

  • normally when a set of options are used in R one uses a character vector, not an unevaluated expression. R specifically provides match.arg for this purpose. This also implements a default of the first option so if we use match.arg the call invoking the function could have omitted type_test = "pretest" as that is the default.

  • .[[test_score]] can be used to specify the indicated column

Thus we have

filter_function <- function(data, test_type = c("pretest", "posttest", "final"), 
  value) {

  test_type <- match.arg(test_type)
  test_score <- paste0(test_type, "_score")
  
  data %>%
    filter(.[[test_score]] > value)
  
}

filter_function(test, test_type = "pretest", value = 2)
# A tibble: 2 x 4
  name   pretest_score posttest_score final_score
  <chr>          <int>          <int>       <int>
1 Sibley             3              7          11
2 Kate               4              8          12

# pretest is the default
filter_function(test, value = 2)
# A tibble: 2 x 4
  name   pretest_score posttest_score final_score
  <chr>          <int>          <int>       <int>
1 Sibley             3              7          11
2 Kate               4              8          12

Also note that we could specify the function like this. The user can still specify "pretest" since match.arg will match the leading substring. In fact they could even specify "pre" or "post".

filter_function2 <- function(data, 
    test_type = c("pretest_score", "posttest_score", "final_score"), 
    value) {

  test_type <- match.arg(test_type)
  
  data %>%
    filter(.[[test_type]] > value)
  
}

filter_function2(test, test_type = "pretest", value = 2)

Base R

This could also be done without any packages like this:

filter_function3 <- function(data, test_type = c("pretest", "posttest", "final"), 
  value) {

  test_type <- match.arg(test_type)
  test_score <- paste0(test_type, "_score")
  
  data[data[[test_score]] > value, ]

}

filter_function3(test, test_type = "pretest", value = 2)
  • Related