I have a dataframe and a list. The list contains the data I need to filter in the dataframe. How can I automate the filtering process when I don't know the variables in the list?
some sample data:
df <- data.frame(V1 = c(sample(1:2,10,replace=T)),
V2 = c(sample(c("A","B","C"),10, replace=T)),
V3 = c(sample(100:104,10,replace=T)))
The list, f_list
, is created in another part of the application and eventually passed on to the function that needs to do the filtering. For example, some times the list contains V1 and V3
f_list <- list()
f_list$V1 <- c("2")
f_list$V3 <- c("101","103","104")
Other times it contains V1 and V2
f_list <- list()
f_list$V1 <- c("1")
f_list$V2 <- c("A","B")
and so on... the real data has hundreds of variables. How can I automate the filtering process that would look something like this when the variables are known?
df %>%
filter(V1 %in% f_list$V1,
V3 %in% f_list$V3)
How do I construct the loop?
EDITED
I edited the name of the object, from ls
to f_list
per @I_0's reminder that objects should not have names of existing functions. Thanks for the help everyone.
CodePudding user response:
You could use if_any
and cur_column
library(dplyr)
df %>%
filter(
(if_all(
.cols = names(f_list),
.fns = ~ .x %in% f_list[[cur_column()]])
)
)
# V1 V2 V3
#1 1 B 100
#2 1 A 101
#3 1 A 102
#4 1 A 103
Note for the time being the ()
around if_all
due to cur_column() requires extra parentheses to work inside if_any() and if_all()
CodePudding user response:
one approach:
## example data:
filter_list <- list()
filter_list$V1 <- c("2") %>% as.numeric
filter_list$V3 <- c("101","103","104") %>% as.numeric
notes: 1. avoid naming an object (your list ls
) like an existing function (ls()
in base R), 2. remember to match the type (numeric
, character
etc.) of filter criteria and filter object.
code:
library(dplyr)
filter_list %>%
as.data.frame %>%
left_join(df)