Home > Mobile >  dplyr filter multiple variables (columns) with multiple conditions
dplyr filter multiple variables (columns) with multiple conditions

Time:04-10

I plan to filter data for multiple columns with multiple columns in one line to reduce the time used for running the code. This is sample data that I used to test my code. Basically, I want to remove any rows containing 0, 1, 2, and NA.

test <- data.frame(A = c(1,0,2,3,4,0,5,6,0,7,0,8,0,9,NA),
                 B = c(0,1,0,2,3,4,0,5,0,7,8,0,NA,9,0),
                 C = c(1,2,3,0,0,4,5,6,0,7,0,8,NA,0,9))

I used the following code to clean my data. Although it does the job, the code is very tedious and takes me quite a while when I run it with a large database.

test %>%  filter(!is.na(A)) %>%
filter(!is.na(B)) %>%
filter(!is.na(C)) %>% 
filter(A != 0) %>% 
filter(A != 1) %>%
filter(A != 2) %>% 
filter(B != 0) %>% 
filter(B != 1) %>%
filter(B != 2) %>%
filter(C != 0) %>% 
filter(C != 1) %>%
filter(C != 2)

  A B C
1 6 5 6
2 7 7 7

I tried to shorten the code using filter, filter_at, and any_vars, but it did not work. Below are my attempts to deal with this problem (all of these codes did not work because they could not delete the row containing 0 (or 1,2, and NA).

df_total <- test %>%
filter_at(vars(A, B, C), any_vars(!is.na(.))) %>%
filter_at(vars(A, B, C), any_vars(. != 2)) %>%
filter_at(vars(A, B, C), any_vars(. != 1)) %>%
filter_at(vars(A, B, C), any_vars(. != 0))

df_total <- test %>%
filter_at(vars(A, B, C), any_vars(!is.na(.) | . != 2 | . != 1 | . != 0))

df_total <- test %>%
filter(!is.na(A) | A!= 2 | A!= 1 | A!= 0) %>%
filter(!is.na(B) | B!= 2 | B!= 1 | B!= 0) %>%
filter(!is.na(C) | C!= 2 | C!= 1 | C!= 0) %>%

I cannot figure out what I did incorrectly here. I went back and forth between the documentation and R to solve this problem, but my efforts were useless. Could you please suggest to me what I did wrong in my code? How can I write a code for multiple columns with multiple conditions in just one line? The point for one line is to speed up the running time for R. Any advice/ suggestions/ resources to find the answer would be appreciated! Thank you.

CodePudding user response:

test %>% 
    filter(across(c(A, B, C), function(x) !is.na(x) & !x %in% c(0, 1, 2)))

# A B C
# 6 5 6
# 7 7 7

CodePudding user response:

Another possible solution:

library(dplyr)

test %>% 
  filter(complete.cases(.) & if_all(everything(), ~ !(.x %in% 0:2)))

#>   A B C
#> 1 6 5 6
#> 2 7 7 7
  • Related