Home > Software design >  Remove a row from all dataframes in a list if NA value in one of the rows
Remove a row from all dataframes in a list if NA value in one of the rows


I have a list of data.frames of equal size. There exist missing data in different rows and columns of each data.frame. I would like to remove the row of each data frame for which one of data.frames have a row that contains a NaN. The current lapply and na.omit code I have removes each row corresponding to the specific data.frame which makes sense as it goes through each data.frame in the list before moving on to the next one. However, I would like to make it so that if an NaN exists in one row of a data.frame that row gets removed from all other data.frames

Some example code:

#Make list
ls <- list(x1=data.frame(a=c(1,2,3,4),b=c(2,3,4,5),c=c(3,4,NaN,6)),
#Desired output
lscalc <- list(x1=data.frame(a=c(1,4),b=c(2,5),c=c(3,6)),

CodePudding user response:

Assuming all the datasets have the same number of rows, get the row index from all the datasets first and then loop over the list and remove those rows

un1 <- unique(unlist(lapply(ls, function(x) which(is.na(x), arr.ind = TRUE)[,1])))
lapply(ls, function(x) x[!seq_len(nrow(x)) %in% un1, ])
  a b c
1 1 2 3
4 4 5 6

  a b c
1 1 2 3
4 4 5 6

CodePudding user response:

Here's one using complete.cases(), though otherwise along the same lines as @akrun's.

#Make list
l <- list(x1=data.frame(a=c(1,2,3,4),b=c(2,3,4,5),c=c(3,4,NaN,6)),
#Desired output
lcalc <- list(x1=data.frame(a=c(1,4),b=c(2,5),c=c(3,6)),

inds <- lapply(l, \(x)which(!complete.cases(x)))
inds <- unique(do.call(c, inds))
lcalc2 <- lapply(l, \(x)x[-inds, ])
#> $x1
#>   a b c
#> 1 1 2 3
#> 4 4 5 6
#> $x2
#>   a b c
#> 1 1 2 3
#> 4 4 5 6

Created on 2022-05-24 by the reprex package (v2.0.1)

  • Related