Home > OS >  Splitting a dateTime vector if time is greater than x between vector components
Splitting a dateTime vector if time is greater than x between vector components

Time:04-15

I have the following data:

df <- data.frame(index = 1:85,
                 times = c(seq(as.POSIXct("2020-10-03 21:31:00 UTC"),
                               as.POSIXct("2020-10-03 22:25:00 UTC")
                               "min"),
                           seq(as.POSIXct("2020-11-03 10:10:00 UTC"),
                               as.POSIXct("2020-11-03 10:39:00 UTC"),
                               "min")
                           ))

if we look at row 55 and 56 there is a clear divide in times:

> df[55:56, ]
   index               times
55    55 2020-10-03 22:25:00
56    56 2020-11-03 10:10:00

I would like to add a third categorical column split based on the splits,

e.g. row df$split[55, ] = A and row df$split[56, ] = B

logic like

If time gap between rows is greater than 5 mins start new category for subsequent rows until the next instance where time gap > 5 mins.

thanks

CodePudding user response:

You could use

library(dplyr)

df %>% 
  mutate(cat = 1   cumsum(c(0, diff(times)) > 5))

which returns

   index               times cat
1      1 2020-10-03 21:31:00   1
2      2 2020-10-03 21:32:00   1
3      3 2020-10-03 21:33:00   1
4      4 2020-10-03 21:34:00   1
5      5 2020-10-03 21:35:00   1
6      6 2020-10-03 21:36:00   1
7      7 2020-10-03 21:37:00   1
8      8 2020-10-03 21:38:00   1
...
53    53 2020-10-03 22:23:00   1
54    54 2020-10-03 22:24:00   1
55    55 2020-10-03 22:25:00   1
56    56 2020-11-03 10:10:00   2
57    57 2020-11-03 10:11:00   2
58    58 2020-11-03 10:12:00   2
59    59 2020-11-03 10:13:00   2

If you need letters or something else, you could for example use

df %>% 
  mutate(cat = LETTERS[1   cumsum(c(0, diff(times)) > 5)])

to convert the categories 1 and 2 into A and B.

  • Related