I have two datasets with tests
(dates and results - long format) and meds
(medication dates) for a number of patients. There are two tests per patients, on two separate dates.
(tests <- structure(list(id = c(1, 1, 2, 2, 3, 3), test_date = structure(c(10957,
10963, 11001, 11035, 11091, 11230), class = "Date"), test_result = c(1,
1, 0, 1, 0, 0)), row.names = c(NA, -6L), class = "data.frame"))
#> id test_date test_result
#> 1 1 2000-01-01 1
#> 2 1 2000-01-07 1
#> 3 2 2000-02-14 0
#> 4 2 2000-03-19 1
#> 5 3 2000-05-14 0
#> 6 3 2000-09-30 0
(meds <- structure(list(id = c(1, 2, 3), med_date = structure(c(10959,
10956, NA), class = "Date")), row.names = c(NA, -3L), class = "data.frame"))
#> id med_date
#> 1 1 2000-01-03
#> 2 2 1999-12-31
#> 3 3 <NA>
I'm trying to create a new column in tests
that specifies whether that patient received the medication within the interval between the two test dates.
Intended output:
#> id test_date test_result received_med_within
#> 1 1 2000-01-01 1 TRUE
#> 2 1 2000-01-07 1 TRUE
#> 3 2 2000-02-14 0 FALSE
#> 4 2 2000-03-19 1 FALSE
#> 5 3 2000-05-14 0 FALSE
#> 6 3 2000-09-30 0 FALSE
I thought I could tackle this by
- pivotting
tests
wider, left_join
ingmeds
to it,- using
if_else(med_date %within% interval(test_date_1, test_date_2), TRUE, FALSE)
- pivotting it longer again
However this is rather convoluted and as the real datasets contain many more columns, pivotting might get a bit tricky.
Is there a cleaner way to check if a date falls between two dates in a long dataset?
CodePudding user response:
You can do the following:
library(dplyr)
tests %>%
left_join(meds) %>%
group_by(id) %>%
mutate(received_med_within = between(med_date, test_date[1], test_date[2])) %>%
tidyr::replace_na(list(received_med_within = FALSE)) %>%
dplyr::select(-4)
# A tibble: 6 x 4
# Groups: id [3]
# id test_date test_result received_med_within
# <dbl> <date> <dbl> <lgl>
# 1 1 2000-01-01 1 TRUE
# 2 1 2000-01-07 1 TRUE
# 3 2 2000-02-14 0 FALSE
# 4 2 2000-03-19 1 FALSE
# 5 3 2000-05-14 0 FALSE
# 6 3 2000-09-30 0 FALSE
CodePudding user response:
You can try to use match
.
tests <- tests[order(tests$id, tests$test_date),] #In case its not ordered
i <- match(tests$id[c(TRUE, FALSE)], meds$id)
cbind(tests, received_med_within =
rep(tests$test_date[c(TRUE, FALSE)] < meds$med_date[i] &
meds$med_date[i] < tests$test_date[c(FALSE, TRUE)], each = 2))
# id test_date test_result received_med_within
#1 1 2000-01-01 1 TRUE
#2 1 2000-01-07 1 TRUE
#3 2 2000-02-14 0 FALSE
#4 2 2000-03-19 1 FALSE
#5 3 2000-05-14 0 NA
#6 3 2000-09-30 0 NA