I have a large data frame with 10000000 rows and 150 columns. in the dataset, there are specific rows that contain all zeros or all NAs or a combination of zeros and NAs. the sample dataframe is shown below
df <- data.frame(x = c('q', 'w', 'e', 'r','t', 'y'), a = c('a','b','c','d','e','f'), b =
c(0,1,2,3,0,5), c= c(0,3,2,4,0,'NA'), d=c(0,2,5,7,'NA',5), e = c(0,5,'NA',3,0,'NA'), f =
c(0,7,4,3,'NA',7))
the desired output is as follows
df1 <- data.frame(x = c('w', 'e', 'r','y'), a = c('b','c','d','f'), b = c(1,2,3,5), c=
c(3,2,4,'NA'), d=c(2,5,7,5), e = c(5,'NA',3,'NA'), f = c(7,4,3,7))
i.e.
df <-
w b 1 3 2 5 7
e c 2 2 5 NA 4
r d 3 4 7 3 3
y f 5 NA 5 NA 7
I tried multiple possible solutions in the stackover flow such as
df %>%
filter(if_all(everything(), ~ !is.na(.x)))
or
df %>%
filter_if(is.numeric, ~ !is.na(.))
but could not solve the problem
CodePudding user response:
You can use apply()
rowwise, combining all()
and na.omit()
df[apply(df[,-c(1,2)],1,\(r) all(na.omit(r)!=0)),]
Output:
x a b c d e f
2 w b 1 3 2 5 7
3 e c 2 2 5 NA 4
4 r d 3 4 7 3 3
6 y f 5 NA 5 NA 7
CodePudding user response:
We may use vectorized operations as it is a big dataset
library(dplyr)
df %>%
filter(!if_all(where(is.numeric), ~ is.na(.x)|.x %in% 0))
-output
x a b c d e f
1 w b 1 3 2 5 7
2 e c 2 2 5 NA 4
3 r d 3 4 7 3 3
4 y f 5 NA 5 NA 7
Or with data.table
library(data.table)
setDT(df)[df[, !Reduce(`&`, lapply(.SD, \(x) is.na(x)| x %in% 0)),
.SDcols = is.numeric]]
-output
x a b c d e f
<char> <char> <num> <char> <char> <char> <char>
1: w b 1 3 2 5 7
2: e c 2 2 5 NA 4
3: r d 3 4 7 3 3
4: y f 5 NA 5 NA 7