I have a file including around 350 columns; year, temperature for each day , yield for different sites. I need to group or split data by year, then calculate the correlation test between yield and each temperature column one by one. I wrote the script below, however, it produce the results only for one year, is there any suggestion where is the problem/issue (it does not go through each year).
for (Y in unique(data_final$YEAR)) {
# cat ("\n\n YEAR =", Y, "\n =========") # Write year Number
subData <- data_final [data_final$YEAR == Y,] # Subset the data
Tmax <- subData[, grepl ("TMAX", colnames (subData))]
Yield <- subData$YIELD # get YIELD column
cortest <- list ()
for (i in 1:length (Tmax)) {
cortest[[i]] <- cor(Tmax[[i]], Yield, use="pairwise.complete.obs", method = "pearson")
}
return(do.call ("rbind", cortest))
}
CodePudding user response:
Sounds like a split, apply, combine task to me. So maybe:
sp <- split(data_final, data_final$YEAR)
one_year <- function(dset) {
message("=== year: ", dset[1,"YEAR"], "===")
# your code
}
res_list <- lapply(sp, one_year)
res <- do.call(rbind, res_list)
can do the trick.
The problem with your code seems to be that you use return
in the outer for
loop. You would want to collect cortest
somehow and then enter the next iteration of the loop.
CodePudding user response:
Thank you @Karsten W. Yes, I want to collect cortest for each YEAR (that's why I used split function). In addition, I have no clue how to mention TMAX columns of each YEAR. The code below does not produce anything.
sp <- split(data_final, data_final$YEAR)
one_year <- function (dset){
message ("=== year: ", dset [1, "YEAR"], "===")
Tmax <- one_year[, grepl ("TMAX", colnames (sp))]
Yield <- one_year$YIELD # get YIELD column
for (i in 1:length (Tmax)) {
cortest[[i]] <- cor(Tmax[[i]], Yield, use="pairwise.complete.obs", method = "pearson")
}
}
res_list <- lapply(sp, one_year)
res <- do.call(rbind, res_list)