Calculating values at specific times-CodePudding

I have a dataset made up of date & time, manual water levels, and transducer water levels as described below. Date & time are in the same column as MM/DD/YYYY HH:MM, manual water levels only occur once a month, whose time of measurement is recorded (all other dates are blank), and transducer data on a daily basis, always read at 12:00 AM of that day, both of which are read as ft mean sea level (msl). I have provided a sample df down below.

   Date_Time             Manual                Transducer
   09/30/2022 12:00                            431.54
   10/01/2022 12:00                            431.59  
   10/01/2022 09:40      431.51             
   10/02/2022 12:00                            431.45            
   10/03/2022 12:00                            431.41

As requested, I have provided this data using dput below:

   dput(df)

   structure(list(Date_Time = c(09/30/2022 12:00, 10/01/2022 12:00, 10/01/2022 09:40, 10/02/2022 12:00, 10/03/2022 12:00), Manual = c(NA, NA, 431.51, NA, NA), Transducer = c(431.54, 431.59, NA, 431.45, 431.41), class = "data.frame")

I have a single year of this data (the above section is a selected portion).

What I want to do with this data is calculate what the transducer WL at the specific time that the manual WL was collected (in the above example @ 09:40).

TDX a = Transducer value directly before Manual WL ; TDX b = Transducer value directly after Manual WL ; y = calculated Transducer value at specific time of manual WL collection (i.e 09:40)

x = (TDX a - TDX b)/24 ; y = TDX a (x * (Time of manual WL collection))

The problem I am facing is that since there is no Transducer data at that time I have to manually enter the time in the calculation but since I have about a hundred locations to do am hoping to automate the process.

My goal is to create a new column with this calculated value at that specific time that way I can extract it quickly.

Any help is appreciated. Thanks!

CodePudding user response：

It sounds to me as though you want to carry out linear interpolation on the missing values in your transducer column. In R we can do this with approx. However, you need to get your date-times into actual date-time format, since it appears from your example that they are currently in character format.

In the absence of a reproducible example, I have read in the table in your question and posted it in reproducible format here:

df <- data.frame(Date_Time = c("09/30/2022 12:00", "10/01/2022 12:00", 
                               "10/01/2022 09:40", "10/02/2022 12:00", 
                               "10/03/2022 12:00"), 
                 `Manual WL (ft msl)` = c(NA, NA, 431.51, NA, NA), 
                 `Transducer WL (ft msl)` = c(431.54, 431.59, NA, 
                                              431.45, 431.41),
                 check.names = FALSE)

This gives us a data frame called df which looks like this:

df
#>          Date_Time Manual WL (ft msl) Transducer WL (ft msl)
#> 1 09/30/2022 12:00                 NA                 431.54
#> 2 10/01/2022 12:00                 NA                 431.59
#> 3 10/01/2022 09:40             431.51                     NA
#> 4 10/02/2022 12:00                 NA                 431.45
#> 5 10/03/2022 12:00                 NA                 431.41

We can use mdy_hm from the lubridate package to convert your date-times to the appropriate format, then use approx to fill in the missing values like so:

library(tidyverse)
library(lubridate)

df <- df %>%
  mutate(Date_Time = mdy_hm(Date_Time)) %>%
  mutate(`Transducer WL (ft msl)` = 
           approx(Date_Time[!is.na(`Transducer WL (ft msl)`)], 
                  na.omit(`Transducer WL (ft msl)`),
                  xout = Date_Time)$y)

And now df looks like this:

df
#>             Date_Time Manual WL (ft msl) Transducer WL (ft msl)
#> 1 2022-09-30 12:00:00                 NA               431.5400
#> 2 2022-10-01 12:00:00                 NA               431.5900
#> 3 2022-10-01 09:40:00             431.51               431.5851
#> 4 2022-10-02 12:00:00                 NA               431.4500
#> 5 2022-10-03 12:00:00                 NA               431.4100

^{Created on 2022-12-13 with reprex v2.0.2}

CodePudding user response：

Using DF shown reproducibly in the Note at the end, use na.approx to fill in the NA's using linear interpolation. The first alternative uses zoo and base R and the second alternative uses zoo and dplyr. Another possibility is to use na.spline in place of na.approx. na.rm=FALSE says to leave any NA's that appear on the ends. If there are never NA's on the ends it could be omitted.

library(zoo)
within(DF, {
  Date_Time <- as.POSIXct(Date_Time, format = "%m/%d/%Y %H:%M")
  Transducer <- na.approx(Transducer, x = Date_Time, na.rm = FALSE)
})
##             Date_Time Manual Transducer
## 1 2022-09-30 12:00:00     NA   431.5400
## 2 2022-10-01 12:00:00     NA   431.5900
## 3 2022-10-01 09:40:00 431.51   431.5851
## 4 2022-10-02 12:00:00     NA   431.4500
## 5 2022-10-03 12:00:00     NA   431.4100

library(dplyr)
library(zoo)
DF %>%
  mutate(Date_Time = as.POSIXct(Date_Time, format = "%m/%d/%Y %H:%M"),
         Transducer = na.approx(Transducer, x = Date_Time, na.rm = FALSE))
##             Date_Time Manual Transducer
## 1 2022-09-30 12:00:00     NA   431.5400
## 2 2022-10-01 12:00:00     NA   431.5900
## 3 2022-10-01 09:40:00 431.51   431.5851
## 4 2022-10-02 12:00:00     NA   431.4500
## 5 2022-10-03 12:00:00     NA   431.4100

Note

DF <- structure(list(Date_Time = c("09/30/2022 12:00", "10/01/2022 12:00", 
"10/01/2022 09:40", "10/02/2022 12:00", "10/03/2022 12:00"), 
    Manual = c(NA, NA, 431.51, NA, NA), Transducer = c(431.54, 
    431.59, NA, 431.45, 431.41)), row.names = c(NA, -5L), cla