I have a dataframe that has an incomplete datetime variables. The interval of the data is the 0.5 second. I want to find what instances are missing and to insert a record for each of the missing time points. For example, the data was shown below:
df <- structure(list(time = structure(c(1652182189, 1652182189, 1652182190, 1652182191
), class = c("POSIXct", "POSIXt"), tzone = ""), positionx = 3:6),
row.names = c(NA, -4L), class = "data.frame")
I try to solve it with pad().
pad(df, interval = " .5 sec")
However, there is an error as below because that the smallest unit of the interval of pad() is the sec.
Error: interval is not valid.
I want pad the dataframe as below. How to solve this problem? Please give me some advices~
time positionx
<dttm> <int>
1 2022-05-10 19:29:49 3
2 2022-05-10 19:29:49 4
3 2022-05-10 19:29:50 5
4 2022-05-10 19:29:50 NA
5 2022-05-10 19:29:51 6
I am a new user of R, please give me some suggestions! Thank you very much!
CodePudding user response:
library(tidyverse)
library(lubridate)
tibble(time=seq(min(df$time), max(df$time), by=seconds(0.5))) %>%
left_join(df, by="time")
# A tibble: 6 × 2
time positionx
<dttm> <int>
1 2022-05-10 12:29:49 3
2 2022-05-10 12:29:49 4
3 2022-05-10 12:29:49 NA
4 2022-05-10 12:29:50 5
5 2022-05-10 12:29:50 NA
6 2022-05-10 12:29:51 6
Edit In response to OP's question in comment. (And I accept I should have spotted the difference between my output and OP's desired result before posting.)
I believe the problem is not in my algorithm, but in OP's input data.
Compare
strftime(df$time,'%Y-%m-%d %H:%M:%OS3')
"2022-05-10 12:29:49.000" "2022-05-10 12:29:49.000" "2022-05-10 12:29:50.000" "2022-05-10 12:29:51.000"
with
strftime(seq(min(df$time), max(df$time), by=seconds(0.5)),'%Y-%m-%d %H:%M:%OS3')
[1] "2022-05-10 12:29:49.000" "2022-05-10 12:29:49.500" "2022-05-10 12:29:50.000" "2022-05-10 12:29:50.500" "2022-05-10 12:29:51.000"
In the input data, rows 1 and 2 represent the same time (at 12:29:49.000). The algorithm correctly adds a third row at 12:29:49.500. All three times appear identical using the default format used to display datetime objects in tibbles.
If OP's input data is instead,
df <- structure(list(time = structure(c(1652182189.0, 1652182189.5, 1652182190.0, 1652182191.0
), class = c("POSIXct", "POSIXt"), tzone = ""), positionx = 3:6),
row.names = c(NA, -4L), class = "data.frame")
Then the output is
tibble(time=seq(min(df$time), max(df$time), by=seconds(0.5))) %>%
left_join(df, by="time")
# A tibble: 5 × 2
time positionx
<dttm> <int>
1 2022-05-10 12:29:49 3
2 2022-05-10 12:29:49 4
3 2022-05-10 12:29:50 5
4 2022-05-10 12:29:50 NA
5 2022-05-10 12:29:51 6
As expected.