I have a list of dataframes which contain two columns with variables: var1 and var2. Several of the variables in var1 and var2 are NAs. Now, I want to create a new list of dataframes in which each dataframe contains only those rows that have no NA values in var1 OR var2.
This is the structure of my dataset:
df1 = data.frame(ID = c(1, 2, 3, 4, 5),
var1 = c(0.1, 0.24, 0.11, 0.8, NA),
var2 = c(1, NA, 0, NA, 1))
df2 = data.frame(ID = c(1, 2, 3, 4, 5),
var1 = c(NA, NA, 0.11, 0.8, 0.1),
var2 = c(100, 19, NA, 9, NA))
df3 = data.frame(ID = c(1, 2, 3, 4, 5),
var1 = c(0.12, 0.3, 0.5, NA, 0.84),
var2 = c(100, 19, 2, 9, 10))
df_list = list(df1, df2, df3)
This is what I wrote to accomplish my task (a for loop inside a function):
out = lapply(df_list, function(x) {
dfList = list()
for (i in c(2:3)) {
df = x[complete.cases(x[ , i]),]
dfList[[i]] = list(df)
}
return(dfList)
})
Now, this works but it is not optimal. The reason for that is that the output creates a list of lists in which some lists do not contain any values:
> out
[[1]]
[[1]][[1]]
NULL
[[1]][[2]]
[[1]][[2]][[1]]
ID var1 var2
1 1 0.10 1
2 2 0.24 NA
3 3 0.11 0
4 4 0.80 NA
[[1]][[3]]
[[1]][[3]][[1]]
ID var1 var2
1 1 0.10 1
3 3 0.11 0
5 5 NA 1
[[2]]
[[2]][[1]]
NULL
[[2]][[2]]
[[2]][[2]][[1]]
ID var1 var2
3 3 0.11 NA
4 4 0.80 9
5 5 0.10 NA
[[2]][[3]]
[[2]][[3]][[1]]
ID var1 var2
1 1 NA 100
2 2 NA 19
4 4 0.8 9
[[3]]
[[3]][[1]]
NULL
[[3]][[2]]
[[3]][[2]][[1]]
ID var1 var2
1 1 0.12 100
2 2 0.30 19
3 3 0.50 2
5 5 0.84 10
[[3]][[3]]
[[3]][[3]][[1]]
ID var1 var2
1 1 0.12 100
2 2 0.30 19
3 3 0.50 2
4 4 NA 9
5 5 0.84 10
I want to avoid those empty lists inside my output but did not manage to figure out the issue with my code. Any ideas?
CodePudding user response:
It is looping from 2, and then it got assigned, thus the 1
was left as empty
lapply(df_list, \(x) lapply(2:3, \(i) list(x[complete.cases(x[[i]]),])))
-output
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
ID var1 var2
1 1 0.10 1
2 2 0.24 NA
3 3 0.11 0
4 4 0.80 NA
[[1]][[2]]
[[1]][[2]][[1]]
ID var1 var2
1 1 0.10 1
3 3 0.11 0
5 5 NA 1
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
ID var1 var2
3 3 0.11 NA
4 4 0.80 9
5 5 0.10 NA
[[2]][[2]]
[[2]][[2]][[1]]
ID var1 var2
1 1 NA 100
2 2 NA 19
4 4 0.8 9
[[3]]
[[3]][[1]]
[[3]][[1]][[1]]
ID var1 var2
1 1 0.12 100
2 2 0.30 19
3 3 0.50 2
5 5 0.84 10
[[3]][[2]]
[[3]][[2]][[1]]
ID var1 var2
1 1 0.12 100
2 2 0.30 19
3 3 0.50 2
4 4 NA 9
5 5 0.84 10
NOTE: In the OP's code, return
after Filter
ing out the NULL
elements
...
return(Filter(Negate(is.null), dfList))
...