Home > Blockchain >  Calculate date lag
Calculate date lag

Time:10-21

I have a dataframe with this shape:

 date       date_lag test_date 
  <date>        <dbl> <date>    
1 2018-12-01       NA 2018-12-01
2 2019-03-01       90 2019-03-01
3 2019-05-01       61 2019-03-01
4 2020-03-10      314 2020-03-10
5 2020-03-16        6 2020-03-10
6 2020-03-23        7 2020-03-16
7 2020-03-24        1 2020-03-23

In order to create date_lag & test_date, I applied this code:

lag <- lag %>%
  mutate(date_lag = as.numeric(date - lag(date), units="days")) %>%
  mutate(test_date = case_when(
    is.na(date_lag) ~ date,
    date_lag < 69 ~ date-date_lag,
    TRUE ~ date)) 

If dates are less than 69 days apart, I want them to have the same date. The problem with my code is that if you see column 6, I don't want it to have the date of column 5 but the date of column 4 because the date_lag is still less than 69 days apart from the previous column, meaning that my desired data will look like:

 date       date_lag test_date 
  <date>        <dbl> <date>    
1 2018-12-01       NA 2018-12-01
2 2019-03-01       90 2019-03-01
3 2019-05-01       61 2019-03-01
4 2020-03-10      314 2020-03-10
5 2020-03-16        6 2020-03-10
6 2020-03-23        7 2020-03-10
7 2020-03-24        1 2020-03-10

Thanks in advance.

CodePudding user response:

Iterate over the dates. For each, compute the difference with all other dates. Using these differences, find the earliest date that's fewer than 69 days before the index date. Like so:

library(purrr)
library(lubridate)

# example data
date_df <- tibble(
  date = ymd("2018-12-01", "2019-03-01", "2019-05-01", "2020-03-10", 
             "2020-03-16", "2020-03-23", "2020-03-24")
)

dates <- date_df$date
date_df$test_date <- map(dates, ~ min(dates[.x - dates < 69])) %>% 
  unlist() %>% 
  as_date()

date_df
#> # A tibble: 7 × 2
#>   date       test_date 
#>   <date>     <date>    
#> 1 2018-12-01 2018-12-01
#> 2 2019-03-01 2019-03-01
#> 3 2019-05-01 2019-03-01
#> 4 2020-03-10 2020-03-10
#> 5 2020-03-16 2020-03-10
#> 6 2020-03-23 2020-03-10
#> 7 2020-03-24 2020-03-10

Created on 2022-10-20 with reprex v2.0.2

PS - I assume date_lag was just a helper column you no longer need. If you do want it, it will depend on how you want to define it. You can either use the same method as your question (if you want days since previous date), or subtract test_date from date (if you want days since test_date).

CodePudding user response:

data.table option, which might work well for bigger datasets:

library(data.table)
dat[, test_date := dat[
          dat[, .(date, datem69 = date-69)],
          on=.(date<=date, date>=datem69), x.date, mult="first"]
    ]
##         date  test_date
##1: 2018-12-01 2018-12-01
##2: 2019-03-01 2019-03-01
##3: 2019-05-01 2019-03-01
##4: 2020-03-10 2020-03-10
##5: 2020-03-16 2020-03-10
##6: 2020-03-23 2020-03-10
##7: 2020-03-24 2020-03-10

Where dat was:

library(data.table)
dat <- fread("date
2018-12-01
2019-03-01
2019-05-01
2020-03-10
2020-03-16
2020-03-23
2020-03-24")

CodePudding user response:

I used the shift() function inside the ifelse and got your desired output. It goes up specified amount of rows and grabs the value.

library(dplyr)
library(data.table)
lag <- lag %>% 
    mutate(
        test_date=ifelse(date_lag > 68 | is.na(date_lag),
                         date,
                         shift(date,1))
    )
lag

        date date_lag  test_date
1 2018-12-01       NA 2018-12-01
2 2019-03-01       90 2019-03-01
3 2019-05-01       61 2019-03-01
4 2020-03-10      314 2020-03-10
5 2020-03-16        6 2020-03-10
6 2020-03-23        7 2020-03-16
7 2020-03-24        1 2020-03-23
  • Related