Home > Software design >  R loop to assign variable based on dates in ref table
R loop to assign variable based on dates in ref table

Time:12-02

I'm trying to assign variables in a dataframe using a loop and referencing another table with dates. The loop would create a new variable (YRTR) in df1 using df2 as a reference.

The problem I'm running into is that some observations need to be assigned multiple YRTRs depending on the begin/end dates. So one observation may turn into multiple observations.

If an END_DATE is 9999-12-31 then the observation is current to today's date.

For example obs. 1 in df1 would turn into 11 observations, 1 for each YRTR since 2021.

Obs. 2 in df1 would turn into 2 observations, 1 with a YRTR of 20221, and 1 with a YRTR of 20223.

Obs. 3 in df1 would turn into 5 observations, 1 for each YRTR since 20221.

df1 looks like this:

|ID|      BEGIN_DATE     |     END_DATE  |
|---------------------|---------------------|------------------|
|1|          2019-05-18        |         9999-12-31       |
|2|          2021-05-15         |         2021-12-17       |
|3|          2021-05-15         |         9999-12-31       |
|4|          2018-12-22         |         2019-05-18       |

The reference data frame (df2) looks like this:

|YRTR|      BEGIN_DATE     |     END_DATE  |
|---------------------|---------------------|------------------|
|20193|          8/27/2018  |         12/21/2018       |
|20195|          1/14/2019   |         5/17/2019      |
|20201|         6/3/2019         |         8/8/2019      |
|20203|          8/26/2019        |         12/20/2019      |
|20205|          1/13/2020        |         5/15/2020       |
|20211|          6/1/2020         |         8/6/2020      |
|20213|          8/24/2020        |         12/18/2020       |
|20215|          1/11/2021         |         5/14/2021       |
|20221|          6/1/2021         |         8/5/2021      |
|20223|          8/23/2021        |         12/17/2021      |
|20225|          1/10/2022         |         5/13/2022       |
|20231|          5/31/2022         |         8/5/2022       |
|20233|          8/22/2022         |         12/16/2022      |

I'm trying to utilize for loops in R to solve this problem.

CodePudding user response:

Ended up using a much simpler data.table method:

setDT(df2)
setDT(df1)

setkey(df2, BEGIN_DATE, END_DATE)
warn_long <- foverlaps(df1, df2, nomatch=NULL)

  • Related