I have a dataset as follow which I want to restrict if the column value in one column exceeds 120 days. For example,
ticket date timelag
1 2009-01-01
1 2010-01-01 360
1 2010-02-02 30
2 2010-02-02
2 2010-02-03 1
2 2010-02-04 1
3 2009-01-01
3 2009-01-02 1
3 2009-01-03 1
3 2009-08-06 130
3 2009-08-07 1
3 2010-02-01 140
So in this case ticket 1 exceeded time lag 120 days in the second event so I want to discard the subsequent rows of ticket 1 after that. In case 2 no ticket event duration exceeded 120 days so all the rows need to be intact for that ticket. So in case 3 the ticket had two successive time periods where it exceeded 120 days but I want to restrict the rows as soon as it encounters the first 120 day period i.e., the event day before the 120 days. So for ticket 3 only the days the first three events should be reported and the rest discarded.
Any advise on how to restrict those rows? Thanks
CodePudding user response:
You may use -
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
group_by(ticket) %>%
slice({
tmp <- diff(date) > 30
if(any(tmp)) 1:which.max(tmp) else row_number()
}) %>% ungroup
# ticket date
# <int> <date>
#1 1 2009-01-01
#2 2 2010-02-02
#3 2 2010-02-03
#4 2 2010-02-04
#5 3 2009-01-01
#6 3 2009-01-02
#7 3 2009-01-03