I have two datasets. First dataset has two date columns (Start, Stop) for each ID. Sometimes it can contain multiple Start-Stop Dates for each ID.
Dataset1
Id Code Start Stop
431 279 2017-11-15 2019-08-15
431 578 2019-09-15 2021-01-15
832 590 2008-04-15 2020-05-15
832 519 2020-06-15 2021-04-15
The second dataset has Id and many time stamped rows per ID, like this below
Id Weight Date
431 12.23 2018
832 15.12 2020
832 6.78 2020
832 4.27 2007
My goal is to create a column InBetween
that indicates "Yes" if the row in 2nd dataset is between the two dates in the first dataset or "No" if the rows in the 2nd dataset is not between the two dates in first dataset, with other columns from 1st dataset like this.
Id Weight Date Between Code Start Stop
431 12.23 2018 Yes 279 2017-11-15 2019-08-15
832 15.12 2020 Yes 590 2008-04-15 2020-05-15
832 6.78 2020 Yes 590 2008-04-15 2020-05-15
832 4.27 2007 No NA NA NA
I could do this with for loops, but I prefer any solution using dplyr, innerjoin or other options without forloops. Thanks in advance.
CodePudding user response:
It's pretty messy but you may try,
df2 %>%
full_join(df1, by = "Id") %>%
mutate(Date = as.Date(ISOdate(Date,1,1)),
Start = as.Date(Start),
Stop = as.Date(Stop)) %>%
rowwise %>%
mutate(Between = between(Date, Start, Stop)) %>%
group_by(Id, Date) %>%
mutate(check = any(Between)) %>%
filter(!(Between == FALSE& check == TRUE)) %>%
mutate(Start = ifelse(check, Start, NA),
Stop = ifelse(check, Stop, NA),
Code = ifelse(check, Code, NA)) %>%
distinct() %>% select(-check)
Id Weight Date Code Start Stop Between
<int> <dbl> <date> <int> <dbl> <dbl> <lgl>
1 431 12.2 2018-01-01 279 17485 18123 TRUE
2 832 15.1 2020-01-01 590 13984 18397 TRUE
3 832 6.78 2020-01-01 590 13984 18397 TRUE
4 832 4.27 2007-01-01 NA NA NA FALSE