I have longitudinal patient data in R. I would like to subset patients in the patid column based on the observation_date where I only keep patients that have the second observation_date occur at least 48 days apart or more. Noting that observation_date may be more than 2 dates.
Table1:
patid | observation_date |
---|---|
1 | 07/07/2016 |
1 | 07/07/2019 |
2 | 07/05/2015 |
2 | 02/12/2016 |
3 | 07/05/2015 |
3 | 07/06/2015 |
4 | 07/05/2015 |
4 | 02/12/2016 |
CodePudding user response:
Reports the diff
in days and filtered out the ones that has less than 48 days diff
library(tidyverse)
library(lubridate)
df <- read_table("patid observation_date
1 07/07/2016
1 07/07/2019
1 07/07/2020
2 07/05/2015
2 02/12/2016
3 07/05/2015
3 07/06/2015
4 07/05/2015
4 02/12/2016") %>%
mutate(observation_date = observation_date %>%
as.Date("%m/%d/%Y"))
df %>%
group_by(patid) %>%
summarise(diff = interval(first(observation_date),
nth(observation_date, 2)) %>% # Select the second observation
as.numeric("days")) %>%
filter(diff >= 48)
# A tibble: 3 x 2
patid diff
<dbl> <dbl>
1 1 1095
2 2 222
3 4 222