Home > Enterprise >  r flag rows in 1st dataframe if the date is between two dates in second dataframe
r flag rows in 1st dataframe if the date is between two dates in second dataframe

Time:10-06

I have two datasets. First dataset has two date columns (Start, Stop) for each ID. Sometimes it can contain multiple Start-Stop Dates for each ID.

Dataset1

  Id    Code    Start        Stop
  431   279     2017-11-15   2019-08-15
  431   578     2019-09-15   2021-01-15
  832   590     2008-04-15   2020-05-15
  832   519     2020-06-15   2021-04-15

The second dataset has Id and many time stamped rows per ID, like this below

  Id   Weight   Date
  431  12.23    2018
  832  15.12    2020
  832   6.78    2020
  832   4.27    2007

My goal is to create a column InBetween that indicates "Yes" if the row in 2nd dataset is between the two dates in the first dataset or "No" if the rows in the 2nd dataset is not between the two dates in first dataset, with other columns from 1st dataset like this.

    Id   Weight   Date   Between  Code  Start        Stop
    431  12.23    2018   Yes      279   2017-11-15   2019-08-15  
    832  15.12    2020   Yes      590   2008-04-15   2020-05-15
    832   6.78    2020   Yes      590   2008-04-15   2020-05-15
    832   4.27    2007   No       NA    NA           NA

I could do this with for loops, but I prefer any solution using dplyr, innerjoin or other options without forloops. Thanks in advance.

CodePudding user response:

It's pretty messy but you may try,

df2 %>%
  full_join(df1, by = "Id") %>%
  mutate(Date = as.Date(ISOdate(Date,1,1)),
         Start = as.Date(Start),
         Stop = as.Date(Stop)) %>%
  rowwise %>%
  mutate(Between = between(Date, Start, Stop)) %>%
  group_by(Id, Date) %>%
  mutate(check = any(Between)) %>%
  filter(!(Between == FALSE& check == TRUE)) %>%
  mutate(Start = ifelse(check, Start, NA), 
         Stop = ifelse(check, Stop, NA),
         Code = ifelse(check, Code, NA)) %>%
  distinct() %>% select(-check)


     Id Weight Date        Code Start  Stop Between
  <int>  <dbl> <date>     <int> <dbl> <dbl> <lgl>  
1   431  12.2  2018-01-01   279 17485 18123 TRUE   
2   832  15.1  2020-01-01   590 13984 18397 TRUE   
3   832   6.78 2020-01-01   590 13984 18397 TRUE   
4   832   4.27 2007-01-01    NA    NA    NA FALSE
  • Related