Home > Back-end >  R: empty lists inside list of lists after running function for loop
R: empty lists inside list of lists after running function for loop

Time:05-21

I have a list of dataframes which contain two columns with variables: var1 and var2. Several of the variables in var1 and var2 are NAs. Now, I want to create a new list of dataframes in which each dataframe contains only those rows that have no NA values in var1 OR var2.

This is the structure of my dataset:

df1 = data.frame(ID = c(1, 2, 3, 4, 5),
                 var1 = c(0.1, 0.24, 0.11, 0.8, NA),
                 var2 = c(1, NA, 0, NA, 1))
df2 = data.frame(ID = c(1, 2, 3, 4, 5),
                 var1 = c(NA, NA, 0.11, 0.8, 0.1),
                 var2 = c(100, 19, NA, 9, NA))
df3 = data.frame(ID = c(1, 2, 3, 4, 5),
                 var1 = c(0.12, 0.3, 0.5, NA, 0.84),
                 var2 = c(100, 19, 2, 9, 10))
df_list = list(df1, df2, df3)

This is what I wrote to accomplish my task (a for loop inside a function):

out = lapply(df_list, function(x) {
  dfList = list()
  for (i in c(2:3)) {
    df = x[complete.cases(x[ , i]),]
    dfList[[i]] = list(df)
  }
  return(dfList)
}) 

Now, this works but it is not optimal. The reason for that is that the output creates a list of lists in which some lists do not contain any values:

> out

[[1]]
[[1]][[1]]
NULL

[[1]][[2]]
[[1]][[2]][[1]]
  ID var1 var2
1  1 0.10    1
2  2 0.24   NA
3  3 0.11    0
4  4 0.80   NA


[[1]][[3]]
[[1]][[3]][[1]]
  ID var1 var2
1  1 0.10    1
3  3 0.11    0
5  5   NA    1



[[2]]
[[2]][[1]]
NULL

[[2]][[2]]
[[2]][[2]][[1]]
  ID var1 var2
3  3 0.11   NA
4  4 0.80    9
5  5 0.10   NA


[[2]][[3]]
[[2]][[3]][[1]]
  ID var1 var2
1  1   NA  100
2  2   NA   19
4  4  0.8    9



[[3]]
[[3]][[1]]
NULL

[[3]][[2]]
[[3]][[2]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
5  5 0.84   10


[[3]][[3]]
[[3]][[3]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
4  4   NA    9
5  5 0.84   10

I want to avoid those empty lists inside my output but did not manage to figure out the issue with my code. Any ideas?

CodePudding user response:

It is looping from 2, and then it got assigned, thus the 1 was left as empty

lapply(df_list, \(x) lapply(2:3, \(i)  list(x[complete.cases(x[[i]]),])))

-output

[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
  ID var1 var2
1  1 0.10    1
2  2 0.24   NA
3  3 0.11    0
4  4 0.80   NA


[[1]][[2]]
[[1]][[2]][[1]]
  ID var1 var2
1  1 0.10    1
3  3 0.11    0
5  5   NA    1



[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
  ID var1 var2
3  3 0.11   NA
4  4 0.80    9
5  5 0.10   NA


[[2]][[2]]
[[2]][[2]][[1]]
  ID var1 var2
1  1   NA  100
2  2   NA   19
4  4  0.8    9



[[3]]
[[3]][[1]]
[[3]][[1]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
5  5 0.84   10


[[3]][[2]]
[[3]][[2]][[1]]
  ID var1 var2
1  1 0.12  100
2  2 0.30   19
3  3 0.50    2
4  4   NA    9
5  5 0.84   10

NOTE: In the OP's code, return after Filtering out the NULL elements

...
 return(Filter(Negate(is.null), dfList))
...
  • Related