I plan to filter data for multiple columns with multiple columns in one line to reduce the time used for running the code. This is sample data that I used to test my code. Basically, I want to remove any rows containing 0, 1, 2, and NA.
test <- data.frame(A = c(1,0,2,3,4,0,5,6,0,7,0,8,0,9,NA),
B = c(0,1,0,2,3,4,0,5,0,7,8,0,NA,9,0),
C = c(1,2,3,0,0,4,5,6,0,7,0,8,NA,0,9))
I used the following code to clean my data. Although it does the job, the code is very tedious and takes me quite a while when I run it with a large database.
test %>% filter(!is.na(A)) %>%
filter(!is.na(B)) %>%
filter(!is.na(C)) %>%
filter(A != 0) %>%
filter(A != 1) %>%
filter(A != 2) %>%
filter(B != 0) %>%
filter(B != 1) %>%
filter(B != 2) %>%
filter(C != 0) %>%
filter(C != 1) %>%
filter(C != 2)
A B C
1 6 5 6
2 7 7 7
I tried to shorten the code using filter
, filter_at
, and any_vars
, but it did not work. Below are my attempts to deal with this problem (all of these codes did not work because they could not delete the row containing 0 (or 1,2, and NA).
df_total <- test %>%
filter_at(vars(A, B, C), any_vars(!is.na(.))) %>%
filter_at(vars(A, B, C), any_vars(. != 2)) %>%
filter_at(vars(A, B, C), any_vars(. != 1)) %>%
filter_at(vars(A, B, C), any_vars(. != 0))
df_total <- test %>%
filter_at(vars(A, B, C), any_vars(!is.na(.) | . != 2 | . != 1 | . != 0))
df_total <- test %>%
filter(!is.na(A) | A!= 2 | A!= 1 | A!= 0) %>%
filter(!is.na(B) | B!= 2 | B!= 1 | B!= 0) %>%
filter(!is.na(C) | C!= 2 | C!= 1 | C!= 0) %>%
I cannot figure out what I did incorrectly here. I went back and forth between the documentation and R to solve this problem, but my efforts were useless. Could you please suggest to me what I did wrong in my code? How can I write a code for multiple columns with multiple conditions in just one line? The point for one line is to speed up the running time for R. Any advice/ suggestions/ resources to find the answer would be appreciated! Thank you.
CodePudding user response:
test %>%
filter(across(c(A, B, C), function(x) !is.na(x) & !x %in% c(0, 1, 2)))
# A B C
# 6 5 6
# 7 7 7
CodePudding user response:
Another possible solution:
library(dplyr)
test %>%
filter(complete.cases(.) & if_all(everything(), ~ !(.x %in% 0:2)))
#> A B C
#> 1 6 5 6
#> 2 7 7 7