I have sensor data and want to create missing timestamps and interpolate the values for features. At the moment I have measurements with varying duration between them. And I would like to have a higher amount. A simplified df below
df1 = data.frame (
value =c('5','15','10','10','5','15'),
time = as.POSIXct(c('2018-06-03 19:40:00','2018-06-03 19:42:00','2018-06-03 19:43:00','2018-06-03 19:45:00','2018-06-03 19:46:00','2018-06-03 19:48:00')))
If I am doing it like this using padr package, it works.
df2=df1 %>% pad(start_val = df1$time[1], end_val = df1$time[6],interval = "sec")
df2$value <- na.approx(df2$value)
But I would like to create timestamps each 0.1 or 0.2 seconds. The padr package can handle 2 seconds, but it seems that less than 1 second not works as I am getting this message
Error: The specified interval is invalid for the datetime variable. Not all original observation are in the padding. If you want to pad at this interval, aggregate the data first with thicken.
Is there a possibility to create timestamps in a smaller interval than 1 second? I tried this
seq.POSIXt(as.POSIXct(df1$time[1]), as.POSIXct(df1$time[6]), units = "seconds", by = .2)
But it creates only a vector of timestamps
CodePudding user response:
If you wanted to employ .1 or .2, and keep the date structure and interpolate values, this works (note that you can change the .1 or .2 in the seq call).
library(tidyverse)
library(padr)
library(zoo)
df1 = data.frame (
value =c('5','15','10','10','5','15'),
time = as.POSIXct(c('2018-06-03 19:40:00','2018-06-03 19:42:00',
'2018-06-03 19:43:00','2018-06-03 19:45:00',
'2018-06-03 19:46:00','2018-06-03 19:48:00')))
# format decimal seconds so that it can be used to compare to the new date range
df1$time <- format(df1$time, "%Y-%m-%d %H:%M:%OS2")
# create the interval
y = seq.POSIXt(as.POSIXct(df1$time[1]), as.POSIXct(df1$time[6]), units = "seconds", by = .2)
# set up new data frame for original values and interpolation
dy = data.frame(time = y,
value = NA) %>%
mutate(time = format(y, "%Y-%m-%d %H:%M:%OS2"))
# obtain row numbers of those that already have values
wh <- sapply(df1$time, function(x) which(dy$time == x))
# return the original values to the dataset
dy[wh, ]$value <- unlist(df1$value)
# interpolate
dy$value <- na.approx(dy$value)
# take a look
head(dy)
# time value
# 1 2018-06-03 19:40:00.00 5.000000
# 2 2018-06-03 19:40:00.20 5.016667
# 3 2018-06-03 19:40:00.40 5.033333
# 4 2018-06-03 19:40:00.59 5.050000
# 5 2018-06-03 19:40:00.79 5.066667
# 6 2018-06-03 19:40:01.00 5.083333
CodePudding user response:
I'm not familiar with padr
, but that last call should give you a sequence with sub-second differences, it might just not be printed as such. The section Sub-second Accuracy in ?DateTimeClasses
explains:
Classes "POSIXct" and "POSIXlt" are able to express fractions of a second. (Conversion of fractions between the two forms may not be exact, but will have better than microsecond accuracy.)
Fractional seconds are printed only if options("digits.secs") is set: see strftime.
Thus,
options("digits.secs"=TRUE)
st <- Sys.time()
st.s <- seq(st, st 1, 0.1)
diff(st.s)
# Time differences in secs
# [1] 0.099999905 0.100000143 0.099999905 0.100000143 0.099999905 0.099999905
# [7] 0.100000143 0.099999905 0.100000143 0.099999905
st
# [1] "2021-12-29 20:00:33.0 CET"
dput(st)
# structure(1640804433.02173, class = c("POSIXct", "POSIXt"))
options("digits.secs"=FALSE)
st
# [1] "2021-12-29 20:00:33 CET"
dput(st)
# structure(1640804433.02173, class = c("POSIXct", "POSIXt"))
Same data, just printed with more accuracy.