I have a list(ldf)
that contains 12 data frames. I'm dropping rows with null values. That is working fine, but my output is saving all the data out as one data frame. I want 12 cleaned data frames not one giant one. I'm not sure how to do this. I've researched but come up dry. Here is my code.
for (i in ldf) {
ldf_no_null <- i[complete.cases(i), ]
}
CodePudding user response:
A lapply
one-liner will do it.
ldf_no_null <- lapply(ldf, \(x) x[complete.cases(x), ])
str(ldf_no_null)
#> List of 3
#> $ :'data.frame': 80 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:80] 5.1 4.9 4.7 5.4 4.6 5 4.9 5.4 4.8 4.3 ...
#> ..$ Sepal.Width : num [1:80] 3.5 3 3.2 3.9 3.4 3.4 3.1 3.7 3 3 ...
#> ..$ Petal.Length: num [1:80] 1.4 1.4 1.3 1.7 1.4 1.5 1.5 1.5 1.4 1.1 ...
#> ..$ Petal.Width : num [1:80] 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.1 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ :'data.frame': 80 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:80] 5.1 4.9 4.7 5.4 4.6 5 4.9 5.4 4.8 4.3 ...
#> ..$ Sepal.Width : num [1:80] 3.5 3 3.2 3.9 3.4 3.4 3.1 3.7 3 3 ...
#> ..$ Petal.Length: num [1:80] 1.4 1.4 1.3 1.7 1.4 1.5 1.5 1.5 1.4 1.1 ...
#> ..$ Petal.Width : num [1:80] 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.1 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ :'data.frame': 80 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:80] 5.1 4.9 4.7 5.4 4.6 5 4.9 5.4 4.8 4.3 ...
#> ..$ Sepal.Width : num [1:80] 3.5 3 3.2 3.9 3.4 3.4 3.1 3.7 3 3 ...
#> ..$ Petal.Length: num [1:80] 1.4 1.4 1.3 1.7 1.4 1.5 1.5 1.5 1.4 1.1 ...
#> ..$ Petal.Width : num [1:80] 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.1 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Created on 2023-02-03 with reprex v2.0.2
Test data
set.seed(2023)
df1 <- iris
df1[] <- lapply(df1, \(x) {is.na(x) <- sample(length(x), 20); x})
ldf <- list(df1, df1, df1)
str(ldf)
#> List of 3
#> $ :'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 NA 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 NA 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 NA 1 1 1 1 1 ...
#> $ :'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 NA 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 NA 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 NA 1 1 1 1 1 ...
#> $ :'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 NA 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 NA 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 NA 1 1 1 1 1 ...
Created on 2023-02-03 with reprex v2.0.2
CodePudding user response:
The problem is that yoi overwrite ldf_no_null
in each iteration, and it will contain just the result of the last iteration. If you want a for
loop, you need to initialize an empty list with three elements first (or in R language a vector
of mode "list"
) and fill it thereafter. It al;so might be better to loop over the indices of the list instead of the list elements directly, we can do this by using seq_along
.
ldf_no_null <- vector(mode='list', length=length(ldf))
for (i in seq_along(ldf)) {
ldf_no_null[[i]] <- ldf[[i]][complete.cases(ldf[[i]]), ]
}
ldf_no_null
# [[1]]
# X1 X2 X3 X4 X5
# 1 1 2 3 2 3
# 3 1 1 1 3 1
# 5 2 3 2 1 1
#
# [[2]]
# X1 X2 X3 X4 X5
# 2 2 2 2 1 2
# 6 2 3 2 2 1
#
# [[3]]
# X1 X2 X3 X4 X5
# 5 2 2 1 2 3
# 6 2 3 1 1 3
Data:
set.seed(42)
ldf <- replicate(3, data.frame(matrix(sample(c(1:3, NA), 6*5, replace=TRUE), 6, 5)), simplify=FALSE)