Home > Blockchain >  How to pass an arbitrary size list of columns to dplyr::filter?
How to pass an arbitrary size list of columns to dplyr::filter?

Time:12-13

Assume we have this dataframe:

> data <- data.frame('a' = c('a', 'b', 'c'), 
                   'b' = c('d', 'e', 'f'), 
                   'c' = c('g', 'h', 'i'),
                   'd' = c('j', 'k', 'l'))

If I want to filter based on column a to have some value, I can do:

> library(dplyr)
> data %>% filter(a == "b")

  a b c d
1 b e h k

If I want to filter based on columns a and b to have some value, I can do:

> library(dplyr)
> data %>% filter(a == "c" & b == "f")

  a b c d
1 c f i l

What if I have a list of columns with arbitrary length? Is there a way to do something like this?

> data %>% filter(c(a,b) == c("c","f"))

So I can pass the arbitrary list of column names, alongside with the list of the wanted values, to the filter function.

CodePudding user response:

You could use a data.frame to keep track of the conditions and use the join syntax instead of filter():

filter_df <- data.frame(a = "c", b = "f")

data |>  
  inner_join(filter_df)

#   a b c d
# 1 c f i l

CodePudding user response:

Maybe this is an overkill for your case, but I took the freedom to allow multiple values to filter. Here is updated data -

library(dplyr)
library(purrr)

data <- data.frame('a' = c('a', 'b', 'c', 'd'), 
                   'b' = c('d', 'e', 'f', 'k'), 
                   'c' = c('g', 'h', 'i', 'e'),
                   'd' = c('j', 'k', 'l', 'g'))
data

#  a b c d
#1 a d g j
#2 b e h k
#3 c f i l
#4 d k e g

Now the filter condition is the following when done manually.

data %>% filter(a %in% c('c', 'd'), b == 'f')

#  a b c d
#1 c f i l

Answer -

To do this in an automated fashion save the condition in a list, use a custom function filter_data and use it with imap.

filter_args <- list(a = c('c', 'd'), b = c('f'))

filter_data <- function(dat, val, var) {
  dat %>% filter(.data[[var]] %in% val)
}

imap(filter_args, ~filter_data(data, .x, .y)) %>%
  reduce(inner_join)

#  a b c d
#1 c f i l

CodePudding user response:

I like the inner_join solution, and this is what I would usually use where possible.

However, there are cases with more complex comparisons in which this doesn’t work. In those cases, you can pass a list of filter expressions to filter using expression splicing.

For this, you first have to create your list of expressions by interpolating the column names and values into a comparison expression:

cols = c('a', 'b')
values = c('c', 'f')
filters = mapply(
  \(c, v) bquote(.(as.name(c)) == .(v)),
  cols, values,
  USE.NAMES = FALSE
)

And then you can use that list of expressions inside filter:

data |> filter(!!! filters)

CodePudding user response:

A base R answer:

cond <- Map(`==`, data[c("a", "b")], c("c", "f"))
data[Reduce(`&`, cond), ]

#  a b c d
#3 c f i l

That would also work with filter:

library(dplyr)
data %>% 
  filter(Reduce(`&`, Map(`==`, .[c("a", "b")], c("c", "f"))))

And translated into tidyverse functions:

library(purrr)
library(dplyr)
data %>% 
  filter(map2(.[c("a", "b")], c("c", "f"), `==`) %>% 
           reduce(`&`))
  • Related