Home > Software engineering >  Subset data.table conditionally by group
Subset data.table conditionally by group

Time:05-29

I have the following data.table,

   id time event
1:  1    1     1
2:  1    2     0
3:  2    1     0
4:  2    2     1

# Generate data;
library(data.table)

data <- data.table(
    id = c(1,1,2,2),
    time = c(1,2,1,2),
    event = c(1,0,0,1)
)

I want to retain all rows up until event == 1, such that the data looks like

   id time event
1:  1    1     1
2:  2    1     0
3:  2    2     1

I'm looking for a data.table solution.

CodePudding user response:

data[, .SD[cumsum(shift(event, type = "lag", fill = 0) > 0) < 1,], by = id]
#       id  time event
#    <num> <num> <num>
# 1:     1     1     1
# 2:     2     1     0
# 3:     2     2     1

The use of cumsum(.) < 1 is effectively "keep until the result is true" (since sum of true is 1 , so cumulative sum of all-false is still 0). However, since we need to keep the current row when we first see event == 1, we need to work on the lagged value using shift(event, type="lag"). The default of shift in this case for the first row of each group is NA which is not going to work, so we use fill=0 to mimic having a non-1 preceding event.

  • Related