Home > Software design >  Checking if a date falls within an interval in a dataset in long format
Checking if a date falls within an interval in a dataset in long format

Time:06-29

I have two datasets with tests (dates and results - long format) and meds (medication dates) for a number of patients. There are two tests per patients, on two separate dates.

(tests <- structure(list(id = c(1, 1, 2, 2, 3, 3), test_date = structure(c(10957, 
10963, 11001, 11035, 11091, 11230), class = "Date"), test_result = c(1, 
1, 0, 1, 0, 0)), row.names = c(NA, -6L), class = "data.frame"))

#>   id  test_date test_result
#> 1  1 2000-01-01           1
#> 2  1 2000-01-07           1
#> 3  2 2000-02-14           0
#> 4  2 2000-03-19           1
#> 5  3 2000-05-14           0
#> 6  3 2000-09-30           0

(meds <- structure(list(id = c(1, 2, 3), med_date = structure(c(10959, 
10956, NA), class = "Date")), row.names = c(NA, -3L), class = "data.frame"))

#>   id   med_date
#> 1  1 2000-01-03
#> 2  2 1999-12-31
#> 3  3       <NA>

I'm trying to create a new column in tests that specifies whether that patient received the medication within the interval between the two test dates.

Intended output:

#>   id  test_date test_result   received_med_within
#> 1  1 2000-01-01           1   TRUE
#> 2  1 2000-01-07           1   TRUE
#> 3  2 2000-02-14           0   FALSE
#> 4  2 2000-03-19           1   FALSE
#> 5  3 2000-05-14           0   FALSE
#> 6  3 2000-09-30           0   FALSE

I thought I could tackle this by

  1. pivotting tests wider,
  2. left_joining meds to it,
  3. using if_else(med_date %within% interval(test_date_1, test_date_2), TRUE, FALSE)
  4. pivotting it longer again

However this is rather convoluted and as the real datasets contain many more columns, pivotting might get a bit tricky.

Is there a cleaner way to check if a date falls between two dates in a long dataset?

CodePudding user response:

You can do the following:

library(dplyr)

tests %>%
  left_join(meds) %>% 
  group_by(id) %>% 
  mutate(received_med_within = between(med_date, test_date[1], test_date[2])) %>% 
  tidyr::replace_na(list(received_med_within = FALSE)) %>% 
  dplyr::select(-4)

# A tibble: 6 x 4
# Groups:   id [3]
# id test_date  test_result received_med_within
# <dbl> <date>           <dbl> <lgl>              
# 1     1 2000-01-01           1 TRUE               
# 2     1 2000-01-07           1 TRUE               
# 3     2 2000-02-14           0 FALSE              
# 4     2 2000-03-19           1 FALSE              
# 5     3 2000-05-14           0 FALSE              
# 6     3 2000-09-30           0 FALSE  

CodePudding user response:

You can try to use match.

tests <- tests[order(tests$id, tests$test_date),] #In case its not ordered

i <- match(tests$id[c(TRUE, FALSE)], meds$id)
cbind(tests, received_med_within =
       rep(tests$test_date[c(TRUE, FALSE)] < meds$med_date[i] &
              meds$med_date[i] < tests$test_date[c(FALSE, TRUE)], each = 2))
#  id  test_date test_result received_med_within
#1  1 2000-01-01           1                TRUE
#2  1 2000-01-07           1                TRUE
#3  2 2000-02-14           0               FALSE
#4  2 2000-03-19           1               FALSE
#5  3 2000-05-14           0                  NA
#6  3 2000-09-30           0                  NA
  • Related