Home > database >  In R, nested loops don't return the expected result while reading through multiple files
In R, nested loops don't return the expected result while reading through multiple files

Time:02-14

Objective: calculate correlation coefficients between 2 variables for multiple files that meet the custom criterion for complete cases (threshold argument).

Data: 332 .csv files in a specified directory.

Desired output: vector with correlation coefficients for the files.

Code:

 correl <- function(directory = "~/specdata/specdatacsv", threshold = 0) {
  filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)
  nobs <- numeric()
  corrvector <- numeric()
  
  for(i in length(filelist)) {
    data <- read.csv(filelist[i])
    nobs <- sum(complete.cases(data))
    if (nobs <= threshold) { next
    } else {
      nitrate <- as.vector(data$nitrate)
      sulfate <- as.vector(data$sulfate)
      goodSulfate <- complete.cases(sulfate)
      goodNitrate <- complete.cases(nitrate)
      icorr <- cor(goodNitrate, goodSulfate)
      corrvector <- c(corrvector, icorr)
    }
  }
  corrvector
}

The output for the threshold 150 should return:

[1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814

But instead the empty corrvector gets returned. Please, help me find the mistake I've made.

CodePudding user response:

As per the comment by @stefan, the problem is here:

for(i in length(filelist)) 

So, if the length of filelist is 332, this is the same as:

for(i in 332) 

whereas you would actually want

for(i in 1:332) 

This can easily be achieved with either:

for (i in 1:length(filelist)) {
  print(i)
}

or

for (i in seq_along(filelist)) {
  print(i)
}
  • Related