Home > database >  How to filter repeated data based on the number of days between dates in R
How to filter repeated data based on the number of days between dates in R

Time:08-04

I have longitudinal patient data in R. I would like to subset patients in the patid column based on the observation_date where I only keep patients that have the second observation_date occur at least 48 days apart or more. Noting that observation_date may be more than 2 dates.

Table1:

patid observation_date
1 07/07/2016
1 07/07/2019
2 07/05/2015
2 02/12/2016
3 07/05/2015
3 07/06/2015
4 07/05/2015
4 02/12/2016

CodePudding user response:

Reports the diff in days and filtered out the ones that has less than 48 days diff

library(tidyverse)
library(lubridate)

df <- read_table("patid observation_date
1   07/07/2016
1   07/07/2019
1   07/07/2020
2   07/05/2015
2   02/12/2016
3   07/05/2015
3   07/06/2015
4   07/05/2015
4   02/12/2016") %>% 
  mutate(observation_date = observation_date %>% 
           as.Date("%m/%d/%Y"))

df %>%  
  group_by(patid) %>%  
  summarise(diff = interval(first(observation_date), 
                            nth(observation_date, 2)) %>% # Select the second observation
              as.numeric("days")) %>% 
  filter(diff >= 48)

# A tibble: 3 x 2
  patid  diff
  <dbl> <dbl>
1     1  1095
2     2   222
3     4   222
  • Related