I'm trying to assign variables in a dataframe using a loop and referencing another table with dates. The loop would create a new variable (YRTR) in df1 using df2 as a reference.
The problem I'm running into is that some observations need to be assigned multiple YRTRs depending on the begin/end dates. So one observation may turn into multiple observations.
If an END_DATE is 9999-12-31 then the observation is current to today's date.
For example obs. 1 in df1 would turn into 11 observations, 1 for each YRTR since 2021.
Obs. 2 in df1 would turn into 2 observations, 1 with a YRTR of 20221, and 1 with a YRTR of 20223.
Obs. 3 in df1 would turn into 5 observations, 1 for each YRTR since 20221.
df1 looks like this:
|ID| BEGIN_DATE | END_DATE |
|---------------------|---------------------|------------------|
|1| 2019-05-18 | 9999-12-31 |
|2| 2021-05-15 | 2021-12-17 |
|3| 2021-05-15 | 9999-12-31 |
|4| 2018-12-22 | 2019-05-18 |
The reference data frame (df2) looks like this:
|YRTR| BEGIN_DATE | END_DATE |
|---------------------|---------------------|------------------|
|20193| 8/27/2018 | 12/21/2018 |
|20195| 1/14/2019 | 5/17/2019 |
|20201| 6/3/2019 | 8/8/2019 |
|20203| 8/26/2019 | 12/20/2019 |
|20205| 1/13/2020 | 5/15/2020 |
|20211| 6/1/2020 | 8/6/2020 |
|20213| 8/24/2020 | 12/18/2020 |
|20215| 1/11/2021 | 5/14/2021 |
|20221| 6/1/2021 | 8/5/2021 |
|20223| 8/23/2021 | 12/17/2021 |
|20225| 1/10/2022 | 5/13/2022 |
|20231| 5/31/2022 | 8/5/2022 |
|20233| 8/22/2022 | 12/16/2022 |
I'm trying to utilize for loops in R to solve this problem.
CodePudding user response:
Ended up using a much simpler data.table method:
setDT(df2)
setDT(df1)
setkey(df2, BEGIN_DATE, END_DATE)
warn_long <- foverlaps(df1, df2, nomatch=NULL)