Home > front end >  How to pad the datetime column of a data frame?
How to pad the datetime column of a data frame?

Time:11-04

I have a dataframe that has an incomplete datetime variables. The interval of the data is the 0.5 second. I want to find what instances are missing and to insert a record for each of the missing time points. For example, the data was shown below:

df <- structure(list(time = structure(c(1652182189, 1652182189, 1652182190, 1652182191
), class = c("POSIXct", "POSIXt"), tzone = ""), positionx = 3:6),
row.names = c(NA, -4L), class = "data.frame")

I try to solve it with pad().

pad(df, interval = " .5 sec")

However, there is an error as below because that the smallest unit of the interval of pad() is the sec.

Error: interval is not valid.

I want pad the dataframe as below. How to solve this problem? Please give me some advices~

  time                positionx
  <dttm>                  <int>
1 2022-05-10 19:29:49         3
2 2022-05-10 19:29:49         4
3 2022-05-10 19:29:50         5
4 2022-05-10 19:29:50        NA
5 2022-05-10 19:29:51         6

I am a new user of R, please give me some suggestions! Thank you very much!

CodePudding user response:

library(tidyverse)
library(lubridate)

tibble(time=seq(min(df$time), max(df$time), by=seconds(0.5))) %>% 
  left_join(df, by="time")
# A tibble: 6 × 2
  time                positionx
  <dttm>                  <int>
1 2022-05-10 12:29:49         3
2 2022-05-10 12:29:49         4
3 2022-05-10 12:29:49        NA
4 2022-05-10 12:29:50         5
5 2022-05-10 12:29:50        NA
6 2022-05-10 12:29:51         6

Edit In response to OP's question in comment. (And I accept I should have spotted the difference between my output and OP's desired result before posting.)

I believe the problem is not in my algorithm, but in OP's input data.

Compare

strftime(df$time,'%Y-%m-%d %H:%M:%OS3')
"2022-05-10 12:29:49.000" "2022-05-10 12:29:49.000" "2022-05-10 12:29:50.000" "2022-05-10 12:29:51.000"

with

strftime(seq(min(df$time), max(df$time), by=seconds(0.5)),'%Y-%m-%d %H:%M:%OS3')
[1] "2022-05-10 12:29:49.000" "2022-05-10 12:29:49.500" "2022-05-10 12:29:50.000" "2022-05-10 12:29:50.500" "2022-05-10 12:29:51.000"

In the input data, rows 1 and 2 represent the same time (at 12:29:49.000). The algorithm correctly adds a third row at 12:29:49.500. All three times appear identical using the default format used to display datetime objects in tibbles.

If OP's input data is instead,

df <- structure(list(time = structure(c(1652182189.0, 1652182189.5, 1652182190.0, 1652182191.0
), class = c("POSIXct", "POSIXt"), tzone = ""), positionx = 3:6),
row.names = c(NA, -4L), class = "data.frame")

Then the output is

tibble(time=seq(min(df$time), max(df$time), by=seconds(0.5))) %>% 
  left_join(df, by="time")
# A tibble: 5 × 2
  time                positionx
  <dttm>                  <int>
1 2022-05-10 12:29:49         3
2 2022-05-10 12:29:49         4
3 2022-05-10 12:29:50         5
4 2022-05-10 12:29:50        NA
5 2022-05-10 12:29:51         6

As expected.

  • Related