Objective: calculate correlation coefficients between 2 variables for multiple files that meet the custom criterion for complete cases (threshold argument).
Data: 332 .csv files in a specified directory.
Desired output: vector with correlation coefficients for the files.
Code:
correl <- function(directory = "~/specdata/specdatacsv", threshold = 0) {
filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)
nobs <- numeric()
corrvector <- numeric()
for(i in length(filelist)) {
data <- read.csv(filelist[i])
nobs <- sum(complete.cases(data))
if (nobs <= threshold) { next
} else {
nitrate <- as.vector(data$nitrate)
sulfate <- as.vector(data$sulfate)
goodSulfate <- complete.cases(sulfate)
goodNitrate <- complete.cases(nitrate)
icorr <- cor(goodNitrate, goodSulfate)
corrvector <- c(corrvector, icorr)
}
}
corrvector
}
The output for the threshold 150 should return:
[1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814
But instead the empty corrvector gets returned. Please, help me find the mistake I've made.
CodePudding user response:
As per the comment by @stefan, the problem is here:
for(i in length(filelist))
So, if the length of filelist
is 332, this is the same as:
for(i in 332)
whereas you would actually want
for(i in 1:332)
This can easily be achieved with either:
for (i in 1:length(filelist)) {
print(i)
}
or
for (i in seq_along(filelist)) {
print(i)
}