I have a dataframe with this shape:
date date_lag test_date
<date> <dbl> <date>
1 2018-12-01 NA 2018-12-01
2 2019-03-01 90 2019-03-01
3 2019-05-01 61 2019-03-01
4 2020-03-10 314 2020-03-10
5 2020-03-16 6 2020-03-10
6 2020-03-23 7 2020-03-16
7 2020-03-24 1 2020-03-23
In order to create date_lag
& test_date
, I applied this code:
lag <- lag %>%
mutate(date_lag = as.numeric(date - lag(date), units="days")) %>%
mutate(test_date = case_when(
is.na(date_lag) ~ date,
date_lag < 69 ~ date-date_lag,
TRUE ~ date))
If dates are less than 69 days apart, I want them to have the same date. The problem with my code is that if you see column 6, I don't want it to have the date of column 5 but the date of column 4 because the date_lag is still less than 69 days apart from the previous column, meaning that my desired data will look like:
date date_lag test_date
<date> <dbl> <date>
1 2018-12-01 NA 2018-12-01
2 2019-03-01 90 2019-03-01
3 2019-05-01 61 2019-03-01
4 2020-03-10 314 2020-03-10
5 2020-03-16 6 2020-03-10
6 2020-03-23 7 2020-03-10
7 2020-03-24 1 2020-03-10
Thanks in advance.
CodePudding user response:
Iterate over the dates. For each, compute the difference with all other dates. Using these differences, find the earliest date that's fewer than 69 days before the index date. Like so:
library(purrr)
library(lubridate)
# example data
date_df <- tibble(
date = ymd("2018-12-01", "2019-03-01", "2019-05-01", "2020-03-10",
"2020-03-16", "2020-03-23", "2020-03-24")
)
dates <- date_df$date
date_df$test_date <- map(dates, ~ min(dates[.x - dates < 69])) %>%
unlist() %>%
as_date()
date_df
#> # A tibble: 7 × 2
#> date test_date
#> <date> <date>
#> 1 2018-12-01 2018-12-01
#> 2 2019-03-01 2019-03-01
#> 3 2019-05-01 2019-03-01
#> 4 2020-03-10 2020-03-10
#> 5 2020-03-16 2020-03-10
#> 6 2020-03-23 2020-03-10
#> 7 2020-03-24 2020-03-10
Created on 2022-10-20 with reprex v2.0.2
PS - I assume date_lag
was just a helper column you no longer need. If you do want it, it will depend on how you want to define it. You can either use the same method as your question (if you want days since previous date), or subtract test_date
from date
(if you want days since test_date
).
CodePudding user response:
data.table option, which might work well for bigger datasets:
library(data.table)
dat[, test_date := dat[
dat[, .(date, datem69 = date-69)],
on=.(date<=date, date>=datem69), x.date, mult="first"]
]
## date test_date
##1: 2018-12-01 2018-12-01
##2: 2019-03-01 2019-03-01
##3: 2019-05-01 2019-03-01
##4: 2020-03-10 2020-03-10
##5: 2020-03-16 2020-03-10
##6: 2020-03-23 2020-03-10
##7: 2020-03-24 2020-03-10
Where dat
was:
library(data.table)
dat <- fread("date
2018-12-01
2019-03-01
2019-05-01
2020-03-10
2020-03-16
2020-03-23
2020-03-24")
CodePudding user response:
I used the shift()
function inside the ifelse
and got your desired output. It goes up specified amount of rows and grabs the value.
library(dplyr)
library(data.table)
lag <- lag %>%
mutate(
test_date=ifelse(date_lag > 68 | is.na(date_lag),
date,
shift(date,1))
)
lag
date date_lag test_date
1 2018-12-01 NA 2018-12-01
2 2019-03-01 90 2019-03-01
3 2019-05-01 61 2019-03-01
4 2020-03-10 314 2020-03-10
5 2020-03-16 6 2020-03-10
6 2020-03-23 7 2020-03-16
7 2020-03-24 1 2020-03-23