How to filter the closest date of one column that is close to the date of another column in R?-CodePudding

The topic might sound confusing. Here's an example:

ID	Index Date	Reference Date	Diagnosis
123	2004-07-15	2004-07-20	Depression
123	2004-07-15	1999-04-08	Anxiety
456	2000-01-15	1998-03-15	Bipolar
456	2000-01-15	1999-10-19	Anxiety

I would like to choose the row of reference date which is close to the index date according to each ID. But seriously I have no idea how I write the code. Does anyone have a good idea?

CodePudding user response：

Convert your date columns to date objects, compute the difference between the index and reference dates, group_by ID and then keep the value with the minimum difference with slice_min:

library(dplyr)
library(lubridate)

df %>% 
  mutate(across(contains("Date"), ymd),
         diff = abs(Index_Date - Reference_Date)) %>% 
  group_by(ID) %>% 
  slice_min(diff)

#     ID Index_Date Reference_Date Diagnosis  diff   
#1   123 2004-07-15 2004-07-20     Depression  5 days
#2   456 2000-01-15 1999-10-19     Anxiety    88 days

CodePudding user response：

Welcome to SO.


library(lubridate)  # date
library(dplyr)      # data wrangling

df <- data.frame(
  ID = c("123","123","456","456"),
  index_date = ymd(c("2004-07-15","2004-07-15","2000-01-15","2000-01-15")),
  reference_date = ymd(c("2004-07-20","1999-04-08","1998-03-15","1999-10-19")),
  diag = c("Depression", "Anxiety", "Bipolar", "Anxiety")
)

df %>% 
  mutate(diff_date = index_date - reference_date) %>% 
  group_by(ID) %>% 
  slice_min(diff_date)


  ID    index_date reference_date diag       diff_date
  <chr> <date>     <date>         <chr>      <drtn>   
1 123   2004-07-15 2004-07-20     Depression -5 days  
2 456   2000-01-15 1999-10-19     Anxiety    88 days

The trick is to use the date difference to select the right row with slice_min