Home > database >  restricting rows based on first encounter of maximum value
restricting rows based on first encounter of maximum value

Time:09-30

I have a dataset as follow which I want to restrict if the column value in one column exceeds 120 days. For example,

      ticket date        timelag
       1     2009-01-01 
       1     2010-01-01     360
       1     2010-02-02      30
       2     2010-02-02       
       2     2010-02-03       1
       2     2010-02-04       1
       3     2009-01-01      
       3     2009-01-02       1
       3     2009-01-03       1
       3     2009-08-06      130
       3     2009-08-07       1
       3     2010-02-01      140

So in this case ticket 1 exceeded time lag 120 days in the second event so I want to discard the subsequent rows of ticket 1 after that. In case 2 no ticket event duration exceeded 120 days so all the rows need to be intact for that ticket. So in case 3 the ticket had two successive time periods where it exceeded 120 days but I want to restrict the rows as soon as it encounters the first 120 day period i.e., the event day before the 120 days. So for ticket 3 only the days the first three events should be reported and the rest discarded.

Any advise on how to restrict those rows? Thanks

CodePudding user response:

You may use -

library(dplyr)

df %>%
  mutate(date = as.Date(date)) %>%
  group_by(ticket) %>%
  slice({
    tmp <-  diff(date) > 30
    if(any(tmp)) 1:which.max(tmp) else row_number()
  }) %>% ungroup

#  ticket date      
#   <int> <date>    
#1      1 2009-01-01
#2      2 2010-02-02
#3      2 2010-02-03
#4      2 2010-02-04
#5      3 2009-01-01
#6      3 2009-01-02
#7      3 2009-01-03
  • Related