I have two different data frames. "df1" has the registrations of diseases' incidence
User_ID <- c("1", "2", "3", "4","2", "5", "6","1", "4", "1")
User_disease <- c("a", "b", "c", "a" , "c", "d", "a", "s", "n", "b")
Diagnosis_date <- c("2020-11-23", "2020-10-12", "2020-11-25", "2020-09-30", "2020-12-23", "2020-12-12", "2020-10-28", "2020-12-09", "2020-06-24", "2020-08-15")
df1 <- data.frame(User_ID, User_disease, Diagnosis_date)
and the second "df2" has records of hospital visits for the same patients
User_ID <- c("1", "2", "3", "4","2", "5", "6","1", "4", "1", "1", "3", "3", "5", "6", "2", "1", "4", "3", "5")
Visit_date <- c("2020-09-23", "2020-09-12", "2020-11-14", "2020-09-23", "2020-12-12", "2020-12-10", "2020-09-24", "2020-12-07", "2020-06-13", "2020-08-12","2020-06-23", "2020-07-11", "2020-12-25", "2020-05-30", "2020-11-29", "2020-12-10", "2020-10-25", "2020-12-21", "2020-09-24", "2020-11-15")
df2 <- data.frame(User_ID, Visit_date)
I want to add to "df1" a new column to indicate whether a patient has any hospital visit within 30 days prior to Diagnosis date.
CodePudding user response:
A lubridate
duration with some filtering of the visit data frame should do the trick:
library(dplyr)
library(lubridate)
User_ID <- c("1", "2", "3", "4","2", "5", "6","1", "4", "1")
User_disease <- c("a", "b", "c", "a" , "c", "d", "a", "s", "n", "b")
Diagnosis_date <- c("2020-11-23", "2020-10-12", "2020-11-25", "2020-09-30", "2020-12-23", "2020-12-12", "2020-10-28", "2020-12-09", "2020-06-24", "2020-08-15")
df1 <- data.frame(User_ID, User_disease, Diagnosis_date) %>%
mutate(Diagnosis_date = ymd(Diagnosis_date))
User_ID_Visit <- c("1", "2", "3", "4","2", "5", "6","1", "4", "1", "1", "3", "3", "5", "6", "2", "1", "4", "3", "5")
Visit_date <- c("2020-09-23", "2020-09-12", "2020-11-14", "2020-09-23", "2020-12-12", "2020-12-10", "2020-09-24", "2020-12-07", "2020-06-13", "2020-08-12","2020-06-23", "2020-07-11", "2020-12-25", "2020-05-30", "2020-11-29", "2020-12-10", "2020-10-25", "2020-12-21", "2020-09-24", "2020-11-15")
df2 <- data.frame(User_ID_Visit, Visit_date) %>%
mutate(Visit_date = ymd(Visit_date))
df1 %>%
rowwise() %>%
mutate(visited_last_month = any(abs(filter(df2, User_ID_Visit == User_ID)$Visit_date %--% Diagnosis_date / ddays(1)) <= 30)) %>%
ungroup()
# # A tibble: 10 × 4
# User_ID User_disease Diagnosis_date visited_last_month
# <chr> <chr> <date> <lgl>
# 1 1 a 2020-11-23 TRUE
# 2 2 b 2020-10-12 TRUE
# 3 3 c 2020-11-25 TRUE
# 4 4 a 2020-09-30 TRUE
# 5 2 c 2020-12-23 TRUE
# 6 5 d 2020-12-12 TRUE
# 7 6 a 2020-10-28 FALSE
# 8 1 s 2020-12-09 TRUE
# 9 4 n 2020-06-24 TRUE
# 10 1 b 2020-08-15 TRUE
Please notice that I made some changes to you input data.