Home > other >  for loop iterating through list of df's is merging data
for loop iterating through list of df's is merging data

Time:02-04

I have a list(ldf) that contains 12 data frames. I'm dropping rows with null values. That is working fine, but my output is saving all the data out as one data frame. I want 12 cleaned data frames not one giant one. I'm not sure how to do this. I've researched but come up dry. Here is my code.

for (i in ldf) {
  ldf_no_null  <- i[complete.cases(i), ]
}

CodePudding user response:

A lapply one-liner will do it.

ldf_no_null <- lapply(ldf, \(x) x[complete.cases(x), ])
str(ldf_no_null)
#> List of 3
#>  $ :'data.frame':    80 obs. of  5 variables:
#>   ..$ Sepal.Length: num [1:80] 5.1 4.9 4.7 5.4 4.6 5 4.9 5.4 4.8 4.3 ...
#>   ..$ Sepal.Width : num [1:80] 3.5 3 3.2 3.9 3.4 3.4 3.1 3.7 3 3 ...
#>   ..$ Petal.Length: num [1:80] 1.4 1.4 1.3 1.7 1.4 1.5 1.5 1.5 1.4 1.1 ...
#>   ..$ Petal.Width : num [1:80] 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.1 0.1 ...
#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ :'data.frame':    80 obs. of  5 variables:
#>   ..$ Sepal.Length: num [1:80] 5.1 4.9 4.7 5.4 4.6 5 4.9 5.4 4.8 4.3 ...
#>   ..$ Sepal.Width : num [1:80] 3.5 3 3.2 3.9 3.4 3.4 3.1 3.7 3 3 ...
#>   ..$ Petal.Length: num [1:80] 1.4 1.4 1.3 1.7 1.4 1.5 1.5 1.5 1.4 1.1 ...
#>   ..$ Petal.Width : num [1:80] 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.1 0.1 ...
#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ :'data.frame':    80 obs. of  5 variables:
#>   ..$ Sepal.Length: num [1:80] 5.1 4.9 4.7 5.4 4.6 5 4.9 5.4 4.8 4.3 ...
#>   ..$ Sepal.Width : num [1:80] 3.5 3 3.2 3.9 3.4 3.4 3.1 3.7 3 3 ...
#>   ..$ Petal.Length: num [1:80] 1.4 1.4 1.3 1.7 1.4 1.5 1.5 1.5 1.4 1.1 ...
#>   ..$ Petal.Width : num [1:80] 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.1 0.1 ...
#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Created on 2023-02-03 with reprex v2.0.2


Test data

set.seed(2023)
df1 <- iris
df1[] <- lapply(df1, \(x) {is.na(x) <- sample(length(x), 20); x})
ldf <- list(df1, df1, df1)
str(ldf)
#> List of 3
#>  $ :'data.frame':    150 obs. of  5 variables:
#>   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 NA 5 5.4 4.6 5 4.4 4.9 ...
#>   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 NA 0.1 ...
#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 NA 1 1 1 1 1 ...
#>  $ :'data.frame':    150 obs. of  5 variables:
#>   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 NA 5 5.4 4.6 5 4.4 4.9 ...
#>   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 NA 0.1 ...
#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 NA 1 1 1 1 1 ...
#>  $ :'data.frame':    150 obs. of  5 variables:
#>   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 NA 5 5.4 4.6 5 4.4 4.9 ...
#>   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 NA 0.1 ...
#>   ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 NA 1 1 1 1 1 ...

Created on 2023-02-03 with reprex v2.0.2

CodePudding user response:

The problem is that yoi overwrite ldf_no_null in each iteration, and it will contain just the result of the last iteration. If you want a for loop, you need to initialize an empty list with three elements first (or in R language a vector of mode "list") and fill it thereafter. It al;so might be better to loop over the indices of the list instead of the list elements directly, we can do this by using seq_along.

ldf_no_null <- vector(mode='list', length=length(ldf))

for (i in seq_along(ldf)) {
  ldf_no_null[[i]] <- ldf[[i]][complete.cases(ldf[[i]]), ]
}

ldf_no_null
# [[1]]
# X1 X2 X3 X4 X5
# 1  1  2  3  2  3
# 3  1  1  1  3  1
# 5  2  3  2  1  1
# 
# [[2]]
# X1 X2 X3 X4 X5
# 2  2  2  2  1  2
# 6  2  3  2  2  1
# 
# [[3]]
# X1 X2 X3 X4 X5
# 5  2  2  1  2  3
# 6  2  3  1  1  3

Data:

set.seed(42)
ldf <- replicate(3, data.frame(matrix(sample(c(1:3, NA), 6*5, replace=TRUE), 6, 5)), simplify=FALSE)
  •  Tags:  
  • r
  • Related