Home > OS >  Iterative labelling of a contiguous binary value arranged by date
Iterative labelling of a contiguous binary value arranged by date

Time:02-15

I'm looking to generate a column in a table that iterates each time there is a state change. It's important this is arranged by date

For example, take the following input data

dummy_d = data.frame(date = seq.POSIXt(from = as.POSIXct("2022-02-14 09:00:00"),
                                   to=as.POSIXct("2022-02-14 09:06:00"),
                                   by = "1 min"),
                 valid = c(1,1,0,0,1,0,0))

The calculated output should look like

dummy_d %>% mutate(group = c(1,1,2,2,3,4,4))

That is to say, arranging by time, each time valid switches from 0 to 1 vice versa, group cumulatively builds.

Any help appreciated

CodePudding user response:

This solution uses the data.table package.

library(data.table)

setDT(dummy_d)

dummy_d[, group := .GRP, by = rleid(as.Date(date), valid)]

dummy_d

#                   date valid group
# 1: 2022-02-14 09:00:00     1     1
# 2: 2022-02-14 09:01:00     1     1
# 3: 2022-02-14 09:02:00     0     2
# 4: 2022-02-14 09:03:00     0     2
# 5: 2022-02-14 09:04:00     1     3
# 6: 2022-02-14 09:05:00     0     4
# 7: 2022-02-14 09:06:00     0     4

CodePudding user response:

With dplyr alone you could try the following:

library(dplyr)

dummy_d %>%
  mutate(group = cumsum(valid != lag(valid, default = first(valid)))   1)

Output

                 date valid group
1 2022-02-14 09:00:00     1     1
2 2022-02-14 09:01:00     1     1
3 2022-02-14 09:02:00     0     2
4 2022-02-14 09:03:00     0     2
5 2022-02-14 09:04:00     1     3
6 2022-02-14 09:05:00     0     4
7 2022-02-14 09:06:00     0     4
  • Related