Home > Enterprise >  Lag dates on condition
Lag dates on condition

Time:05-19

I have the following data frame,

       Leaflet start_date end_date id
 1:   1       2022-01-05     2022-01-08 15
 2:   1       2022-01-05     2022-01-11 14
 3:   1       2022-01-09     2022-01-11 10
 4:  10       2022-03-09     2022-03-12 21
 5:  10       2022-03-09     2022-03-15 18
 6: 10       2022-03-13     2022-03-15 32
 7: 11       2022-03-16     2022-03-19 30
 8: 11       2022-03-16     2022-03-22  4
 9: 11       2022-03-20     2022-03-22 39

As you can see there are overlapping intervals. What I would like to do is to create a new_start_date_promo where it will roll the end date to the start date If the intervals overlap. In the end I want to get the following:

      Leaflet start_date_promo end_date_promo id new_start_promo
1:  1       2022-01-05     2022-01-08 15      2022-01-05
2:  1       2022-01-05     2022-01-11 14      2022-01-08
3:  1       2022-01-09     2022-01-11 10      2022-01-09
4: 10       2022-03-09     2022-03-12 21      2022-03-09
5: 10       2022-03-09     2022-03-15 18      2022-03-12
6: 10       2022-03-13     2022-03-15 32      2022-03-13
7:  11       2022-03-16     2022-03-19 30      2022-03-16
8:  11       2022-03-16     2022-03-22  4      2022-03-19
9: 11       2022-03-20     2022-03-22 39      2022-03-20

I tried this with dplyr but it looks very convoluted, messes up the date formats and I am not sure if it will follow my methodology. Nevertheless, this was my attempt:

library(dplyr)
leaflet_dim %>% 
 group_by(Leaflet) %>% 
 arrange(start_date, end_date) %>% 
 mutate(new_start = lag(ifelse(start_date == lead(start_date), end_date_promo, lead(start_date)))) %>% 
 mutate(new_start = anytime::anytime(ifelse(is.na(new_start),start_date_promo, new_start)), new_start = new_start - (24*60*60))

DATA

structure(list(Leaflet = c("1", "1", "1", 
"10", "10", "10", "11", "11", 
"11"), start_date_promo = structure(c(18997, 18997, 19001, 
19060, 19060, 19064, 19067, 19067, 19071), class = "Date"), end_date_promo = structure(c(19000, 
19003, 19003, 19063, 19066, 19066, 19070, 19073, 19073), class = "Date"), 
    id = c(15L, 14L, 10L, 21L, 18L, 32L, 30L, 4L, 39L)), row.names = c(NA, 
-9L), class = c("data.table", "data.frame"))

CodePudding user response:

Here's a tidy solution:

library(dplyr)

dat %>% 
  group_by(Leaflet) %>% 
  mutate(new_start = case_when(start_date_promo < lag(end_date_promo) & start_date_promo == lag(start_date_promo) ~ lag(end_date_promo), 
                               TRUE ~ start_date_promo))
#> # A tibble: 9 × 5
#> # Groups:   Leaflet [3]
#>   Leaflet    start_date_promo end_date_promo    id new_start 
#>   <chr>      <date>           <date>         <int> <date>    
#> 1 Revista 1  2022-01-05       2022-01-08        15 2022-01-05
#> 2 Revista 1  2022-01-05       2022-01-11        14 2022-01-08
#> 3 Revista 1  2022-01-09       2022-01-11        10 2022-01-09
#> 4 Revista 10 2022-03-09       2022-03-12        21 2022-03-09
#> 5 Revista 10 2022-03-09       2022-03-15        18 2022-03-12
#> 6 Revista 10 2022-03-13       2022-03-15        32 2022-03-13
#> 7 Revista 11 2022-03-16       2022-03-19        30 2022-03-16
#> 8 Revista 11 2022-03-16       2022-03-22         4 2022-03-19
#> 9 Revista 11 2022-03-20       2022-03-22        39 2022-03-20

CodePudding user response:

If you group by the start date, you can use pmax:

data.table

library(data.table)
dat[, new_start_promo := pmax(start_date_promo, shift(end_date_promo), na.rm = TRUE), by = start_date_promo]
dat
#       Leaflet start_date_promo end_date_promo    id new_start_promo
#        <char>           <Date>         <Date> <int>          <Date>
# 1:  Revista 1       2022-01-05     2022-01-08    15      2022-01-05
# 2:  Revista 1       2022-01-05     2022-01-11    14      2022-01-08
# 3:  Revista 1       2022-01-09     2022-01-11    10      2022-01-09
# 4: Revista 10       2022-03-09     2022-03-12    21      2022-03-09
# 5: Revista 10       2022-03-09     2022-03-15    18      2022-03-12
# 6: Revista 10       2022-03-13     2022-03-15    32      2022-03-13
# 7: Revista 11       2022-03-16     2022-03-19    30      2022-03-16
# 8: Revista 11       2022-03-16     2022-03-22     4      2022-03-19
# 9: Revista 11       2022-03-20     2022-03-22    39      2022-03-20

dplyr

library(dplyr)
dat %>%
  group_by(start_date_promo) %>%
  mutate(new_start_promo = pmax(start_date_promo, lag(end_date_promo), na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 9 x 5
#   Leaflet    start_date_promo end_date_promo    id new_start_promo
#   <chr>      <date>           <date>         <int> <date>         
# 1 Revista 1  2022-01-05       2022-01-08        15 2022-01-05     
# 2 Revista 1  2022-01-05       2022-01-11        14 2022-01-08     
# 3 Revista 1  2022-01-09       2022-01-11        10 2022-01-09     
# 4 Revista 10 2022-03-09       2022-03-12        21 2022-03-09     
# 5 Revista 10 2022-03-09       2022-03-15        18 2022-03-12     
# 6 Revista 10 2022-03-13       2022-03-15        32 2022-03-13     
# 7 Revista 11 2022-03-16       2022-03-19        30 2022-03-16     
# 8 Revista 11 2022-03-16       2022-03-22         4 2022-03-19     
# 9 Revista 11 2022-03-20       2022-03-22        39 2022-03-20     
  • Related