Home > Software engineering >  loop and apply family functions for climate and yield data to calculate correlation
loop and apply family functions for climate and yield data to calculate correlation

Time:08-25

I have a file including around 350 columns; year, temperature for each day , yield for different sites. I need to group or split data by year, then calculate the correlation test between yield and each temperature column one by one. I wrote the script below, however, it produce the results only for one year, is there any suggestion where is the problem/issue (it does not go through each year).

for (Y in unique(data_final$YEAR)) {
  # cat ("\n\n YEAR =", Y, "\n =========") # Write year Number
  subData <- data_final [data_final$YEAR == Y,] # Subset the data
  Tmax <- subData[, grepl ("TMAX", colnames (subData))]
  Yield <- subData$YIELD # get YIELD column
  cortest <- list ()
  
  for (i in 1:length (Tmax)) {
  cortest[[i]] <- cor(Tmax[[i]], Yield, use="pairwise.complete.obs", method = "pearson")
  
  }
  return(do.call ("rbind", cortest))
 }

CodePudding user response:

Sounds like a split, apply, combine task to me. So maybe:

sp <- split(data_final, data_final$YEAR)
one_year <- function(dset) {
    message("=== year: ", dset[1,"YEAR"], "===")
    # your code
}
res_list <- lapply(sp, one_year)
res <- do.call(rbind, res_list)

can do the trick.

The problem with your code seems to be that you use return in the outer for loop. You would want to collect cortest somehow and then enter the next iteration of the loop.

CodePudding user response:

Thank you @Karsten W. Yes, I want to collect cortest for each YEAR (that's why I used split function). In addition, I have no clue how to mention TMAX columns of each YEAR. The code below does not produce anything.

sp <- split(data_final, data_final$YEAR)
one_year <- function (dset){
  message ("=== year: ", dset [1, "YEAR"], "===")
  Tmax <- one_year[, grepl ("TMAX", colnames (sp))]
  Yield <- one_year$YIELD # get YIELD column
  
  for (i in 1:length (Tmax)) {
  cortest[[i]] <- cor(Tmax[[i]], Yield, use="pairwise.complete.obs", method = "pearson")
  }
}
res_list <- lapply(sp, one_year)
res <- do.call(rbind, res_list)
  • Related