I have the following data frame,
Leaflet start_date end_date id
1: 1 2022-01-05 2022-01-08 15
2: 1 2022-01-05 2022-01-11 14
3: 1 2022-01-09 2022-01-11 10
4: 10 2022-03-09 2022-03-12 21
5: 10 2022-03-09 2022-03-15 18
6: 10 2022-03-13 2022-03-15 32
7: 11 2022-03-16 2022-03-19 30
8: 11 2022-03-16 2022-03-22 4
9: 11 2022-03-20 2022-03-22 39
As you can see there are overlapping intervals. What I would like to do is to create a new_start_date_promo where it will roll the end date to the start date If the intervals overlap. In the end I want to get the following:
Leaflet start_date_promo end_date_promo id new_start_promo
1: 1 2022-01-05 2022-01-08 15 2022-01-05
2: 1 2022-01-05 2022-01-11 14 2022-01-08
3: 1 2022-01-09 2022-01-11 10 2022-01-09
4: 10 2022-03-09 2022-03-12 21 2022-03-09
5: 10 2022-03-09 2022-03-15 18 2022-03-12
6: 10 2022-03-13 2022-03-15 32 2022-03-13
7: 11 2022-03-16 2022-03-19 30 2022-03-16
8: 11 2022-03-16 2022-03-22 4 2022-03-19
9: 11 2022-03-20 2022-03-22 39 2022-03-20
I tried this with dplyr
but it looks very convoluted, messes up the date formats and I am not sure if it will follow my methodology. Nevertheless, this was my attempt:
library(dplyr)
leaflet_dim %>%
group_by(Leaflet) %>%
arrange(start_date, end_date) %>%
mutate(new_start = lag(ifelse(start_date == lead(start_date), end_date_promo, lead(start_date)))) %>%
mutate(new_start = anytime::anytime(ifelse(is.na(new_start),start_date_promo, new_start)), new_start = new_start - (24*60*60))
DATA
structure(list(Leaflet = c("1", "1", "1",
"10", "10", "10", "11", "11",
"11"), start_date_promo = structure(c(18997, 18997, 19001,
19060, 19060, 19064, 19067, 19067, 19071), class = "Date"), end_date_promo = structure(c(19000,
19003, 19003, 19063, 19066, 19066, 19070, 19073, 19073), class = "Date"),
id = c(15L, 14L, 10L, 21L, 18L, 32L, 30L, 4L, 39L)), row.names = c(NA,
-9L), class = c("data.table", "data.frame"))
CodePudding user response:
Here's a tidy solution:
library(dplyr)
dat %>%
group_by(Leaflet) %>%
mutate(new_start = case_when(start_date_promo < lag(end_date_promo) & start_date_promo == lag(start_date_promo) ~ lag(end_date_promo),
TRUE ~ start_date_promo))
#> # A tibble: 9 × 5
#> # Groups: Leaflet [3]
#> Leaflet start_date_promo end_date_promo id new_start
#> <chr> <date> <date> <int> <date>
#> 1 Revista 1 2022-01-05 2022-01-08 15 2022-01-05
#> 2 Revista 1 2022-01-05 2022-01-11 14 2022-01-08
#> 3 Revista 1 2022-01-09 2022-01-11 10 2022-01-09
#> 4 Revista 10 2022-03-09 2022-03-12 21 2022-03-09
#> 5 Revista 10 2022-03-09 2022-03-15 18 2022-03-12
#> 6 Revista 10 2022-03-13 2022-03-15 32 2022-03-13
#> 7 Revista 11 2022-03-16 2022-03-19 30 2022-03-16
#> 8 Revista 11 2022-03-16 2022-03-22 4 2022-03-19
#> 9 Revista 11 2022-03-20 2022-03-22 39 2022-03-20
CodePudding user response:
If you group by the start date, you can use pmax
:
data.table
library(data.table)
dat[, new_start_promo := pmax(start_date_promo, shift(end_date_promo), na.rm = TRUE), by = start_date_promo]
dat
# Leaflet start_date_promo end_date_promo id new_start_promo
# <char> <Date> <Date> <int> <Date>
# 1: Revista 1 2022-01-05 2022-01-08 15 2022-01-05
# 2: Revista 1 2022-01-05 2022-01-11 14 2022-01-08
# 3: Revista 1 2022-01-09 2022-01-11 10 2022-01-09
# 4: Revista 10 2022-03-09 2022-03-12 21 2022-03-09
# 5: Revista 10 2022-03-09 2022-03-15 18 2022-03-12
# 6: Revista 10 2022-03-13 2022-03-15 32 2022-03-13
# 7: Revista 11 2022-03-16 2022-03-19 30 2022-03-16
# 8: Revista 11 2022-03-16 2022-03-22 4 2022-03-19
# 9: Revista 11 2022-03-20 2022-03-22 39 2022-03-20
dplyr
library(dplyr)
dat %>%
group_by(start_date_promo) %>%
mutate(new_start_promo = pmax(start_date_promo, lag(end_date_promo), na.rm = TRUE)) %>%
ungroup()
# # A tibble: 9 x 5
# Leaflet start_date_promo end_date_promo id new_start_promo
# <chr> <date> <date> <int> <date>
# 1 Revista 1 2022-01-05 2022-01-08 15 2022-01-05
# 2 Revista 1 2022-01-05 2022-01-11 14 2022-01-08
# 3 Revista 1 2022-01-09 2022-01-11 10 2022-01-09
# 4 Revista 10 2022-03-09 2022-03-12 21 2022-03-09
# 5 Revista 10 2022-03-09 2022-03-15 18 2022-03-12
# 6 Revista 10 2022-03-13 2022-03-15 32 2022-03-13
# 7 Revista 11 2022-03-16 2022-03-19 30 2022-03-16
# 8 Revista 11 2022-03-16 2022-03-22 4 2022-03-19
# 9 Revista 11 2022-03-20 2022-03-22 39 2022-03-20