Home > Net >  r getting second occurence of sequence
r getting second occurence of sequence

Time:12-27

    ~SUBJID, ~TP.DATE, ~TPR_ar,
    '2617001', '2019-04-11', 'Undefined',
    '2617001', '2019-07-09', 'PD',       
    '2617001', '2019-09-07', 'PD',       
    '2617001', '2019-10-19', 'PD',      
    '2617001', '2019-11-12', 'PD',      
    '2617001', '2020-01-13', 'PR',      
    '2617001', '2020-02-24', 'PD',
    '2617001', '2020-03-24', 'PD',
)

Hi, stackoverflow! I would like to get the specific date of above data. You can see that for above data, sequence of TPR_ar goes : 'Undefined', 'PD', 'PR', 'PD'. What I would like to do is get the second-first date of PD (2020-02-24). Thanks in advance!

CodePudding user response:

We could use rle or rleid to group adjacent similar elements

library(dplyr)
library(data.table)
df1 %>%
   group_by(grp = rleid(TPR_ar)) %>% 
   filter(TPR_ar == 'PD', row_number() == 1) %>% 
   ungroup %>%
   slice(2) %>%
   pull(TP.DATE)
[1] "2020-02-24"

If it is grouped by "SUBJID"

df1 %>%
   group_by(SUBJID, grp = rleid(TPR_ar)) %>% 
    filter(TPR_ar == 'PD', row_number() == 1) %>%
    group_by(SUBJID) %>% 
    slice(2) %>%
    pull(TP.DATE)

CodePudding user response:

Here is one without rleid:

library(dplyr)

df %>% 
  group_by(x = cumsum(TPR_ar != lag(TPR_ar, def = first(TPR_ar))) 1) %>%
  slice(1) %>% 
  filter(TPR_ar == "PD") %>% 
  ungroup() %>% 
  slice(2) %>% 
  pull(TP.DATE)
1] "2020-02-24"
  • Related