R: empty lists inside list of lists after running function for loop-CodePudding

I have a list of dataframes which contain two columns with variables: var1 and var2. Several of the variables in var1 and var2 are NAs. Now, I want to create a new list of dataframes in which each dataframe contains only those rows that have no NA values in var1 OR var2.

This is the structure of my dataset:

df1 = data.frame(ID = c(1, 2, 3, 4, 5),
                 var1 = c(0.1, 0.24, 0.11, 0.8, NA),
                 var2 = c(1, NA, 0, NA, 1))
df2 = data.frame(ID = c(1, 2, 3, 4, 5),
                 var1 = c(NA, NA, 0.11, 0.8, 0.1),
                 var2 = c(100, 19, NA, 9, NA))
df3 = data.frame(ID = c(1, 2, 3, 4, 5),
                 var1 = c(0.12, 0.3, 0.5, NA, 0.84),
                 var2 = c(100, 19, 2, 9, 10))
df_list = list(df1, df2, df3)

This is what I wrote to accomplish my task (a for loop inside a function):

out = lapply(df_list, function(x) {
  dfList = list()
  for (i in c(2:3)) {
    df = x[complete.cases(x[ , i]),]
    dfList[[i]] = list(df)
  }
  return(dfList)
})

Now, this works but it is not optimal. The reason for that is that the output creates a list of lists in which some lists do not contain any values:

> out

[[1]]
[[1]][[1]]
NULL

[[1]][[2]]
[[1]][[2]][[1]]
  ID var1 var2
1  1 0.10    1
2  2 0.24   NA
3  3 0.11    0
4  4 0.80   NA


[[1]][[3]]
[[1]][[3]][[1]]
  ID var1 var2
1  1 0.10    1
3  3 0.11    0
5  5   NA    1



[[2]]
[[2]][[1]]
NULL

[[2]][[2]]
[[2]][[2]][[1]]
  ID var1 var2
3  3 0.11   NA
4  4 0.80    9
5  5 0.10   NA


[[2]][[3]]
[[2]][[3]][[1]]
  ID var1 var2
1  1   NA  100
2  2   NA   19
4  4  0.8    9



[[3]]
[[3]][[1]]
NULL

[[3]][[2]]
[[3]][[2]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
5  5 0.84   10


[[3]][[3]]
[[3]][[3]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
4  4   NA    9
5  5 0.84   10

I want to avoid those empty lists inside my output but did not manage to figure out the issue with my code. Any ideas?

CodePudding user response：

It is looping from 2, and then it got assigned, thus the 1 was left as empty

lapply(df_list, \(x) lapply(2:3, \(i)  list(x[complete.cases(x[[i]]),])))

-output

[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
  ID var1 var2
1  1 0.10    1
2  2 0.24   NA
3  3 0.11    0
4  4 0.80   NA


[[1]][[2]]
[[1]][[2]][[1]]
  ID var1 var2
1  1 0.10    1
3  3 0.11    0
5  5   NA    1



[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
  ID var1 var2
3  3 0.11   NA
4  4 0.80    9
5  5 0.10   NA


[[2]][[2]]
[[2]][[2]][[1]]
  ID var1 var2
1  1   NA  100
2  2   NA   19
4  4  0.8    9



[[3]]
[[3]][[1]]
[[3]][[1]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
5  5 0.84   10


[[3]][[2]]
[[3]][[2]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
4  4   NA    9
5  5 0.84   10

NOTE: In the OP's code, return after Filtering out the NULL elements

...
 return(Filter(Negate(is.null), dfList))
...