Home > Net >  How to filter the closest date of one column that is close to the date of another column in R?
How to filter the closest date of one column that is close to the date of another column in R?

Time:10-04

The topic might sound confusing. Here's an example:

ID Index Date Reference Date Diagnosis
123 2004-07-15 2004-07-20 Depression
123 2004-07-15 1999-04-08 Anxiety
456 2000-01-15 1998-03-15 Bipolar
456 2000-01-15 1999-10-19 Anxiety

I would like to choose the row of reference date which is close to the index date according to each ID. But seriously I have no idea how I write the code. Does anyone have a good idea?

CodePudding user response:

Convert your date columns to date objects, compute the difference between the index and reference dates, group_by ID and then keep the value with the minimum difference with slice_min:

library(dplyr)
library(lubridate)

df %>% 
  mutate(across(contains("Date"), ymd),
         diff = abs(Index_Date - Reference_Date)) %>% 
  group_by(ID) %>% 
  slice_min(diff)

#     ID Index_Date Reference_Date Diagnosis  diff   
#1   123 2004-07-15 2004-07-20     Depression  5 days
#2   456 2000-01-15 1999-10-19     Anxiety    88 days

CodePudding user response:

Welcome to SO.


library(lubridate)  # date
library(dplyr)      # data wrangling

df <- data.frame(
  ID = c("123","123","456","456"),
  index_date = ymd(c("2004-07-15","2004-07-15","2000-01-15","2000-01-15")),
  reference_date = ymd(c("2004-07-20","1999-04-08","1998-03-15","1999-10-19")),
  diag = c("Depression", "Anxiety", "Bipolar", "Anxiety")
)

df %>% 
  mutate(diff_date = index_date - reference_date) %>% 
  group_by(ID) %>% 
  slice_min(diff_date)


  ID    index_date reference_date diag       diff_date
  <chr> <date>     <date>         <chr>      <drtn>   
1 123   2004-07-15 2004-07-20     Depression -5 days  
2 456   2000-01-15 1999-10-19     Anxiety    88 days

The trick is to use the date difference to select the right row with slice_min

  •  Tags:  
  • r
  • Related