How can I effectivize my script for correcting a logger's seasonal drift in R?-CodePudding

I have installed a bunch of CO2 loggers in water that log CO2 every hour for the open water season. I have characterized the loggers at 3 different concentrations of CO2 before and after installing them.

I assume that the seasonal drift in error will be linear
I assume that the error between my characterization points will be linear

My script is based on a for loop that goes through each timestamp and corrects the value, this works but is unfortuneately not fast enough. I know that this can be done within a second but I am not sure how. I seek some advice and I would be grateful if someone could show me how.

Reproduceable example based on basic R:

start <- as.POSIXct("2022-08-01 00:00:00")#time when logger is installed
stop <- as.POSIXct("2022-09-01 00:00:00")#time when retrieved
dt <- seq.POSIXt(start,stop,by=3600)#generate datetime column, measured hourly
#generate a bunch of values within my measured range
co2 <- round(rnorm(length(dt),mean=600,sd=100))
#generate dummy dataframe
dummy <- data.frame(dt,co2)

#actual values used in characterization
actual <- c(0,400,1000)

#measured in the container by the instruments being characterized
measured.pre <- c(105,520,1150)
measured.post <- c(115,585,1250)

diff.pre <- measured.pre-actual#diff at precharacterization
diff.post <- measured.post-actual#diff at post

#linear interpolation of how deviance from actual values change throughout the season
#I assume that the temporal drift is linear 
diff.0 <- seq(diff.pre[1],diff.post[1],length.out=length(dummy$dt))
diff.400 <- seq(diff.pre[2],diff.post[2],length.out = length(dummy$dt))
diff.1000 <-  seq(diff.pre[3],diff.post[3],length.out = length(dummy$dt))

#creates a data frame with the assumed drift at each increment throughout the season
dummy <- data.frame(dummy,diff.0,diff.400,diff.1000)

#this loop makes a 3-point calibration at each day in the dummy data set
co2.corrected <- vector()
for(i in 1:nrow(dummy)){
  print(paste0("row: ",i))#to show the progress of the loop
  diff.0 <- dummy$diff.0[i]#get the differences at characterization increments
  diff.400 <- dummy$diff.400[i]
  diff.1000 <- dummy$diff.1000[i]
  #values below are only used for encompassing the range of measured values in the characterization
  #this is based on the interpolated difference at the given time point and the known concentrations used 
  measured.0 <- diff.0 0
  measured.400 <- diff.400 400
  measured.1000 <- diff.1000 1000
  
  #linear difference between calibration at 0 and 400
  seg1 <- seq(diff.0,diff.400,length.out=measured.400-measured.0)
  #linear difference between calibration at 400 and 1000
  seg2 <- seq(diff.400,diff.1000,length.out=measured.1000-measured.400)
  #bind them together to get one vector
  correction.ppm <- c(seg1,seg2)
  
  
  #the complete range of measured co2 in the characterization.
  #in reality it can not be below 0 and thus it can not be below the minimum measured in the range
  measured.co2.range <- round(seq(measured.0,measured.1000,length.out=length(correction.ppm)))
  #generate a table from which we can characterize the measured values from
  correction.table <- data.frame(measured.co2.range,correction.ppm)
  
  co2 <- dummy$co2[i] #measured co2 at the current row
  #find the measured value in the table and extract the difference
  diff <- correction.table$correction.ppm[match(co2,correction.table$measured.co2.range)]
  #correct the value and save it to vector
  co2.corrected[i] <- co2-diff
  
}
#generate column with calibrated values
dummy$co2.corrected <- co2.corrected

CodePudding user response：

I do not know what you are calculating (I feel that this could be done differently), but you can increase speed by:

remove print, that takes a lot of time inside loop
remove data.frame creation in each iteration, that is slow and not needed here

This loop should be faster:

for(i in 1:nrow(dummy)){
  diff.0 <- dummy$diff.0[i]
  diff.400 <- dummy$diff.400[i]
  diff.1000 <- dummy$diff.1000[i]
  
  measured.0 <- diff.0 0
  measured.400 <- diff.400 400
  measured.1000 <- diff.1000 1000
  
  seg1 <- seq(diff.0,diff.400,length.out=measured.400-measured.0)
  seg2 <- seq(diff.400,diff.1000,length.out=measured.1000-measured.400)
  correction.ppm <- c(seg1,seg2)
  
  s <- seq(measured.0,measured.1000,length.out=length(correction.ppm))
  measured.co2.range <- round(s)
  
  co2 <- dummy$co2[i]
  diff <- correction.ppm[match(co2, measured.co2.range)]
  co2.corrected[i] <- co2-diff
}

p.s. now the slowest part from my testing is round(s). Maybe that can be removed or rewritten...

CodePudding user response：

This is what I understand after reviewing the code. You have a series of CO2 concentration readings, but they need to be corrected based on characterization measurements taken at the beginning of the timeseries and at the end of the timeseries. Both sets of characterization measurements were made using three known concentrations: 0, 400, and 1000.

Your code appears to be attempting to apply bilinear interpolation (over time and concentration) to apply the needed correction. This is easy to vectorize:

set.seed(1)
start <- as.POSIXct("2022-08-01 00:00:00")#time when logger is installed
stop <- as.POSIXct("2022-09-01 00:00:00")#time when retrieved
dt <- seq.POSIXt(start,stop,by=3600)#generate datetime column, measured hourly
#generate a bunch of values within my measured range
co2 <- round(rnorm(length(dt),mean=600,sd=100))

#actual values used in characterization
actual <- c(0,400,1000)

#measured in the container by the instruments being characterized
measured.pre <- c(105,520,1150)
measured.post <- c(115,585,1250)
# interpolate the reference concentrations over time
cref <- mapply(seq, measured.pre, measured.post, length.out = length(dt))
#generate dummy dataframe with corrected values
dummy <- data.frame(
  dt,
  co2,
  co2.corrected = ifelse(
    co2 < cref[,2],
    actual[1]   (co2 - cref[,1])*(actual[2] - actual[1])/(cref[,2] - cref[,1]),
    actual[2]   (co2 - cref[,2])*(actual[3] - actual[2])/(cref[,3] - cref[,2])
  )
)
head(dummy)
#>                    dt co2 co2.corrected
#> 1 2022-08-01 00:00:00 537      416.1905
#> 2 2022-08-01 01:00:00 618      493.2432
#> 3 2022-08-01 02:00:00 516      395.9776
#> 4 2022-08-01 03:00:00 760      628.2707
#> 5 2022-08-01 04:00:00 633      507.2542
#> 6 2022-08-01 05:00:00 518      397.6533