Home > Blockchain >  R how to automate a filter for unknown variables
R how to automate a filter for unknown variables

Time:06-05

I have a dataframe and a list. The list contains the data I need to filter in the dataframe. How can I automate the filtering process when I don't know the variables in the list?

some sample data:

df <- data.frame(V1 = c(sample(1:2,10,replace=T)),
                 V2 = c(sample(c("A","B","C"),10, replace=T)),
                 V3 = c(sample(100:104,10,replace=T)))

The list, f_list, is created in another part of the application and eventually passed on to the function that needs to do the filtering. For example, some times the list contains V1 and V3

f_list <- list()
f_list$V1 <- c("2")
f_list$V3 <- c("101","103","104")

Other times it contains V1 and V2

f_list <- list()
f_list$V1 <- c("1")
f_list$V2 <- c("A","B")

and so on... the real data has hundreds of variables. How can I automate the filtering process that would look something like this when the variables are known?

df %>% 
  filter(V1 %in% f_list$V1,
         V3 %in% f_list$V3)

How do I construct the loop?

EDITED I edited the name of the object, from ls to f_list per @I_0's reminder that objects should not have names of existing functions. Thanks for the help everyone.

CodePudding user response:

You could use if_any and cur_column

library(dplyr)

df %>% 
  filter(
    (if_all(
      .cols = names(f_list),
      .fns  = ~ .x %in% f_list[[cur_column()]])
    )
  )

#  V1 V2  V3
#1  1  B 100
#2  1  A 101
#3  1  A 102
#4  1  A 103

Note for the time being the () around if_all due to cur_column() requires extra parentheses to work inside if_any() and if_all()

CodePudding user response:

one approach:

## example data:
filter_list <- list()
filter_list$V1 <- c("2") %>% as.numeric
filter_list$V3 <- c("101","103","104") %>% as.numeric

notes: 1. avoid naming an object (your list ls) like an existing function (ls() in base R), 2. remember to match the type (numeric, character etc.) of filter criteria and filter object.

code:

library(dplyr)

filter_list %>%
  as.data.frame %>%
  left_join(df)
  • Related