Home > database >  Calculating values at specific times
Calculating values at specific times

Time:12-14

I have a dataset made up of date & time, manual water levels, and transducer water levels as described below. Date & time are in the same column as MM/DD/YYYY HH:MM, manual water levels only occur once a month, whose time of measurement is recorded (all other dates are blank), and transducer data on a daily basis, always read at 12:00 AM of that day, both of which are read as ft mean sea level (msl). I have provided a sample df down below.

   Date_Time             Manual                Transducer
   09/30/2022 12:00                            431.54
   10/01/2022 12:00                            431.59  
   10/01/2022 09:40      431.51             
   10/02/2022 12:00                            431.45            
   10/03/2022 12:00                            431.41

As requested, I have provided this data using dput below:

   dput(df)

   structure(list(Date_Time = c(09/30/2022 12:00, 10/01/2022 12:00, 10/01/2022 09:40, 10/02/2022 12:00, 10/03/2022 12:00), Manual = c(NA, NA, 431.51, NA, NA), Transducer = c(431.54, 431.59, NA, 431.45, 431.41), class = "data.frame")

I have a single year of this data (the above section is a selected portion).

What I want to do with this data is calculate what the transducer WL at the specific time that the manual WL was collected (in the above example @ 09:40).

TDX a = Transducer value directly before Manual WL ; TDX b = Transducer value directly after Manual WL ; y = calculated Transducer value at specific time of manual WL collection (i.e 09:40)

x = (TDX a - TDX b)/24 ; y = TDX a (x * (Time of manual WL collection))

The problem I am facing is that since there is no Transducer data at that time I have to manually enter the time in the calculation but since I have about a hundred locations to do am hoping to automate the process.

My goal is to create a new column with this calculated value at that specific time that way I can extract it quickly.

Any help is appreciated. Thanks!

CodePudding user response:

It sounds to me as though you want to carry out linear interpolation on the missing values in your transducer column. In R we can do this with approx. However, you need to get your date-times into actual date-time format, since it appears from your example that they are currently in character format.

In the absence of a reproducible example, I have read in the table in your question and posted it in reproducible format here:

df <- data.frame(Date_Time = c("09/30/2022 12:00", "10/01/2022 12:00", 
                               "10/01/2022 09:40", "10/02/2022 12:00", 
                               "10/03/2022 12:00"), 
                 `Manual WL (ft msl)` = c(NA, NA, 431.51, NA, NA), 
                 `Transducer WL (ft msl)` = c(431.54, 431.59, NA, 
                                              431.45, 431.41),
                 check.names = FALSE)

This gives us a data frame called df which looks like this:

df
#>          Date_Time Manual WL (ft msl) Transducer WL (ft msl)
#> 1 09/30/2022 12:00                 NA                 431.54
#> 2 10/01/2022 12:00                 NA                 431.59
#> 3 10/01/2022 09:40             431.51                     NA
#> 4 10/02/2022 12:00                 NA                 431.45
#> 5 10/03/2022 12:00                 NA                 431.41

We can use mdy_hm from the lubridate package to convert your date-times to the appropriate format, then use approx to fill in the missing values like so:

library(tidyverse)
library(lubridate)

df <- df %>%
  mutate(Date_Time = mdy_hm(Date_Time)) %>%
  mutate(`Transducer WL (ft msl)` = 
           approx(Date_Time[!is.na(`Transducer WL (ft msl)`)], 
                  na.omit(`Transducer WL (ft msl)`),
                  xout = Date_Time)$y)

And now df looks like this:

df
#>             Date_Time Manual WL (ft msl) Transducer WL (ft msl)
#> 1 2022-09-30 12:00:00                 NA               431.5400
#> 2 2022-10-01 12:00:00                 NA               431.5900
#> 3 2022-10-01 09:40:00             431.51               431.5851
#> 4 2022-10-02 12:00:00                 NA               431.4500
#> 5 2022-10-03 12:00:00                 NA               431.4100

Created on 2022-12-13 with reprex v2.0.2

CodePudding user response:

Using DF shown reproducibly in the Note at the end, use na.approx to fill in the NA's using linear interpolation. The first alternative uses zoo and base R and the second alternative uses zoo and dplyr. Another possibility is to use na.spline in place of na.approx. na.rm=FALSE says to leave any NA's that appear on the ends. If there are never NA's on the ends it could be omitted.

library(zoo)
within(DF, {
  Date_Time <- as.POSIXct(Date_Time, format = "%m/%d/%Y %H:%M")
  Transducer <- na.approx(Transducer, x = Date_Time, na.rm = FALSE)
})
##             Date_Time Manual Transducer
## 1 2022-09-30 12:00:00     NA   431.5400
## 2 2022-10-01 12:00:00     NA   431.5900
## 3 2022-10-01 09:40:00 431.51   431.5851
## 4 2022-10-02 12:00:00     NA   431.4500
## 5 2022-10-03 12:00:00     NA   431.4100

library(dplyr)
library(zoo)
DF %>%
  mutate(Date_Time = as.POSIXct(Date_Time, format = "%m/%d/%Y %H:%M"),
         Transducer = na.approx(Transducer, x = Date_Time, na.rm = FALSE))
##             Date_Time Manual Transducer
## 1 2022-09-30 12:00:00     NA   431.5400
## 2 2022-10-01 12:00:00     NA   431.5900
## 3 2022-10-01 09:40:00 431.51   431.5851
## 4 2022-10-02 12:00:00     NA   431.4500
## 5 2022-10-03 12:00:00     NA   431.4100

Note

DF <- structure(list(Date_Time = c("09/30/2022 12:00", "10/01/2022 12:00", 
"10/01/2022 09:40", "10/02/2022 12:00", "10/03/2022 12:00"), 
    Manual = c(NA, NA, 431.51, NA, NA), Transducer = c(431.54, 
    431.59, NA, 431.45, 431.41)), row.names = c(NA, -5L), cla
  • Related