Home > Software engineering >  In R datatable, how to identify block of rows of variable length?
In R datatable, how to identify block of rows of variable length?

Time:11-24

I have a large datatable with a 'status' column. Status could be 0, 1, or 2. Zero or more rows of status 0 or 2 can precede a row with status of 1. E.g.

dt <- data.table(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), status = c(1, 0, 1, 0, 2, 1, 0, 0, 0, 1))

I want to identify the groups 1 (id = 1), 2 (id = 2:3), 3 (id = 4:6), and 4 (id = 7:10). Is there an efficient way to do this instead of running through a loop?

Please let me know. Thanks!

CodePudding user response:

library(data.table)
dt <- data.table(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), status = c(1, 0, 1, 0, 2, 1, 0, 0, 0, 1))
dt[,.(group=rleid(rev(cumsum(rev(status==1)))), id, status)]
#>     group id status
#>  1:     1  1      1
#>  2:     2  2      0
#>  3:     2  3      1
#>  4:     3  4      0
#>  5:     3  5      2
#>  6:     3  6      1
#>  7:     4  7      0
#>  8:     4  8      0
#>  9:     4  9      0
#> 10:     4 10      1

Created on 2022-11-23 with reprex v2.0.2

  • Related