The topic might sound confusing. Here's an example:
ID | Index Date | Reference Date | Diagnosis |
---|---|---|---|
123 | 2004-07-15 | 2004-07-20 | Depression |
123 | 2004-07-15 | 1999-04-08 | Anxiety |
456 | 2000-01-15 | 1998-03-15 | Bipolar |
456 | 2000-01-15 | 1999-10-19 | Anxiety |
I would like to choose the row of reference date which is close to the index date according to each ID. But seriously I have no idea how I write the code. Does anyone have a good idea?
CodePudding user response:
Convert your date columns to date
objects, compute the diff
erence between the index and reference dates, group_by
ID and then keep the value with the minimum difference with slice_min
:
library(dplyr)
library(lubridate)
df %>%
mutate(across(contains("Date"), ymd),
diff = abs(Index_Date - Reference_Date)) %>%
group_by(ID) %>%
slice_min(diff)
# ID Index_Date Reference_Date Diagnosis diff
#1 123 2004-07-15 2004-07-20 Depression 5 days
#2 456 2000-01-15 1999-10-19 Anxiety 88 days
CodePudding user response:
Welcome to SO.
library(lubridate) # date
library(dplyr) # data wrangling
df <- data.frame(
ID = c("123","123","456","456"),
index_date = ymd(c("2004-07-15","2004-07-15","2000-01-15","2000-01-15")),
reference_date = ymd(c("2004-07-20","1999-04-08","1998-03-15","1999-10-19")),
diag = c("Depression", "Anxiety", "Bipolar", "Anxiety")
)
df %>%
mutate(diff_date = index_date - reference_date) %>%
group_by(ID) %>%
slice_min(diff_date)
ID index_date reference_date diag diff_date
<chr> <date> <date> <chr> <drtn>
1 123 2004-07-15 2004-07-20 Depression -5 days
2 456 2000-01-15 1999-10-19 Anxiety 88 days
The trick is to use the date difference to select the right row with slice_min