I have a dataset made up of date & time, manual water levels, and transducer water levels as described below. Date & time are in the same column as MM/DD/YYYY HH:MM, manual water levels only occur once a month, whose time of measurement is recorded (all other dates are blank), and transducer data on a daily basis, always read at 12:00 AM of that day, both of which are read as ft mean sea level (msl). I have provided a sample df down below.
Date_Time Manual Transducer
09/30/2022 12:00 431.54
10/01/2022 12:00 431.59
10/01/2022 09:40 431.51
10/02/2022 12:00 431.45
10/03/2022 12:00 431.41
As requested, I have provided this data using dput below:
dput(df)
structure(list(Date_Time = c(09/30/2022 12:00, 10/01/2022 12:00, 10/01/2022 09:40, 10/02/2022 12:00, 10/03/2022 12:00), Manual = c(NA, NA, 431.51, NA, NA), Transducer = c(431.54, 431.59, NA, 431.45, 431.41), class = "data.frame")
I have a single year of this data (the above section is a selected portion).
What I want to do with this data is calculate what the transducer WL at the specific time that the manual WL was collected (in the above example @ 09:40).
TDX a = Transducer value directly before Manual WL ; TDX b = Transducer value directly after Manual WL ; y = calculated Transducer value at specific time of manual WL collection (i.e 09:40)
x = (TDX a - TDX b)/24 ; y = TDX a (x * (Time of manual WL collection))
The problem I am facing is that since there is no Transducer data at that time I have to manually enter the time in the calculation but since I have about a hundred locations to do am hoping to automate the process.
My goal is to create a new column with this calculated value at that specific time that way I can extract it quickly.
Any help is appreciated. Thanks!
CodePudding user response:
It sounds to me as though you want to carry out linear interpolation on the missing values in your transducer column. In R we can do this with approx
. However, you need to get your date-times into actual date-time format, since it appears from your example that they are currently in character format.
In the absence of a reproducible example, I have read in the table in your question and posted it in reproducible format here:
df <- data.frame(Date_Time = c("09/30/2022 12:00", "10/01/2022 12:00",
"10/01/2022 09:40", "10/02/2022 12:00",
"10/03/2022 12:00"),
`Manual WL (ft msl)` = c(NA, NA, 431.51, NA, NA),
`Transducer WL (ft msl)` = c(431.54, 431.59, NA,
431.45, 431.41),
check.names = FALSE)
This gives us a data frame called df
which looks like this:
df
#> Date_Time Manual WL (ft msl) Transducer WL (ft msl)
#> 1 09/30/2022 12:00 NA 431.54
#> 2 10/01/2022 12:00 NA 431.59
#> 3 10/01/2022 09:40 431.51 NA
#> 4 10/02/2022 12:00 NA 431.45
#> 5 10/03/2022 12:00 NA 431.41
We can use mdy_hm
from the lubridate package to convert your date-times to the appropriate format, then use approx
to fill in the missing values like so:
library(tidyverse)
library(lubridate)
df <- df %>%
mutate(Date_Time = mdy_hm(Date_Time)) %>%
mutate(`Transducer WL (ft msl)` =
approx(Date_Time[!is.na(`Transducer WL (ft msl)`)],
na.omit(`Transducer WL (ft msl)`),
xout = Date_Time)$y)
And now df
looks like this:
df
#> Date_Time Manual WL (ft msl) Transducer WL (ft msl)
#> 1 2022-09-30 12:00:00 NA 431.5400
#> 2 2022-10-01 12:00:00 NA 431.5900
#> 3 2022-10-01 09:40:00 431.51 431.5851
#> 4 2022-10-02 12:00:00 NA 431.4500
#> 5 2022-10-03 12:00:00 NA 431.4100
Created on 2022-12-13 with reprex v2.0.2
CodePudding user response:
Using DF
shown reproducibly in the Note at the end, use na.approx
to fill in the NA's using linear interpolation. The first alternative uses zoo and base R and the second alternative uses zoo and dplyr. Another possibility is to use na.spline
in place of na.approx
. na.rm=FALSE
says to leave any NA's that appear on the ends. If there are never NA's on the ends it could be omitted.
library(zoo)
within(DF, {
Date_Time <- as.POSIXct(Date_Time, format = "%m/%d/%Y %H:%M")
Transducer <- na.approx(Transducer, x = Date_Time, na.rm = FALSE)
})
## Date_Time Manual Transducer
## 1 2022-09-30 12:00:00 NA 431.5400
## 2 2022-10-01 12:00:00 NA 431.5900
## 3 2022-10-01 09:40:00 431.51 431.5851
## 4 2022-10-02 12:00:00 NA 431.4500
## 5 2022-10-03 12:00:00 NA 431.4100
library(dplyr)
library(zoo)
DF %>%
mutate(Date_Time = as.POSIXct(Date_Time, format = "%m/%d/%Y %H:%M"),
Transducer = na.approx(Transducer, x = Date_Time, na.rm = FALSE))
## Date_Time Manual Transducer
## 1 2022-09-30 12:00:00 NA 431.5400
## 2 2022-10-01 12:00:00 NA 431.5900
## 3 2022-10-01 09:40:00 431.51 431.5851
## 4 2022-10-02 12:00:00 NA 431.4500
## 5 2022-10-03 12:00:00 NA 431.4100
Note
DF <- structure(list(Date_Time = c("09/30/2022 12:00", "10/01/2022 12:00",
"10/01/2022 09:40", "10/02/2022 12:00", "10/03/2022 12:00"),
Manual = c(NA, NA, 431.51, NA, NA), Transducer = c(431.54,
431.59, NA, 431.45, 431.41)), row.names = c(NA, -5L), cla