Home > Enterprise >  Part 2: Check if any dates within an Interval are within any of the dates in another Interval
Part 2: Check if any dates within an Interval are within any of the dates in another Interval

Time:11-27

This is a follow up to my previous question here

As a reminder, I had 2 intervals of dates, and I want to see if any of the dates in interval_A are within interval_B. I am ideally looking for a dplyr solution.

The original solution worked, until I added 12 months to the end of the first interval and re-ran the code.

I had an unexpected result because making the interval_A longer should only lead to potentially more values that might fall into interval_B, resulting in a TRUE, since the idea was to see if any dates of interval_A fall into interval_B.

Reproducible example

Let's create 3 intervals this time: interval_A interval_Aplus12 interval_B, where interval_Aplus12 extends by 12 months the end of interval_A.

library(lubridate)

interval_A <- 
  new("Interval", .Data = c(20822400, 10454400, 42508800, 18662400, 
                            12355200, 16243200, 10195200, 14774400, 37324800, 31276800, 27734400, 
                            62985600, 15724800, 32054400, 21427200), start = structure(c(94953600, 
                                                                                         131328000, 240451200, 294278400, 334454400, 449193600, 493344000, 
                                                                                         546739200, 575596800, 760320000, 930700800, 1088553600, 1481673600, 
                                                                                         1513123200, 1647388800), tzone = "UTC", class = c("POSIXct", 
                                                                                                                                           "POSIXt")), tzone = "UTC")


interval_B <- 
  new("Interval", .Data = c(41904000, 15724800, 42163200, 20995200, 
                            21168000, 47347200, 5184000), start = structure(c(120960000, 
                                                                              315532800, 362793600, 646790400, 983404800, 1196467200, 1580515200
                            ), tzone = "UTC", class = c("POSIXct", "POSIXt")), tzone = "UTC")



interval_A_plus12 <- 
  new("Interval", .Data = c(52358400, 41990400, 74044800, 50284800, 
                          43891200, 47779200, 41731200, 46396800, 68860800, 62812800, 59270400, 
                          94521600, 47260800, 63590400, 53568000), start = structure(c(94953600, 
                                                                                       131328000, 240451200, 294278400, 334454400, 449193600, 493344000, 
                                                                                       546739200, 575596800, 760320000, 930700800, 1088553600, 1481673600, 
                                                                                       1513123200, 1647388800), tzone = "UTC", class = c("POSIXct", 
                                                                                                                                         "POSIXt")), tzone = "UTC")

To see if any of the dates in interval_A were in any of the intervals in interval_B, the solution was:

tibble(
  interval_A,
  in_interval_B = map_lgl(interval_A, ~ any(.x %within% interval_B))
)

That yields:

       interval_A                     in_interval_B
   <Interval>                     <lgl>        
 1 1973-01-04 UTC--1973-09-02 UTC FALSE        
 2 1974-03-01 UTC--1974-06-30 UTC TRUE         
 3 1977-08-15 UTC--1978-12-20 UTC FALSE        
 4 1979-04-30 UTC--1979-12-02 UTC FALSE        
 5 1980-08-07 UTC--1980-12-28 UTC FALSE        
 6 1984-03-27 UTC--1984-10-01 UTC FALSE        
 7 1985-08-20 UTC--1985-12-16 UTC FALSE        
 8 1987-04-30 UTC--1987-10-18 UTC FALSE        
 9 1988-03-29 UTC--1989-06-04 UTC FALSE        
10 1994-02-04 UTC--1995-02-01 UTC FALSE        
11 1999-06-30 UTC--2000-05-16 UTC FALSE        
12 2004-06-30 UTC--2006-06-29 UTC FALSE        
13 2016-12-14 UTC--2017-06-14 UTC FALSE        
14 2017-12-13 UTC--2018-12-19 UTC FALSE        
15 2022-03-16 UTC--2022-11-19 UTC FALSE  

If we now use the extend interval, interval_A_plus12, we get all values as FALSE (instead of more TRUE values. For example, at least the first 2 values should be TRUE since 1974-01-01 is in the first 2 intervals in interval_A_plus12, and that date is within the first interval in interval_B)

tibble(
  interval_A_plus12,
  in_interval_B = map_lgl(interval_A_plus12, ~ any(.x %within% interval_B))
)



   interval_A_plus12              in_interval_B
   <Interval>                     <lgl>        
 1 1973-01-04 UTC--1974-09-02 UTC FALSE        
 2 1974-03-01 UTC--1975-06-30 UTC FALSE        
 3 1977-08-15 UTC--1979-12-20 UTC FALSE        
 4 1979-04-30 UTC--1980-12-02 UTC FALSE        
 5 1980-08-07 UTC--1981-12-28 UTC FALSE        
 6 1984-03-27 UTC--1985-10-01 UTC FALSE        
 7 1985-08-20 UTC--1986-12-16 UTC FALSE        
 8 1987-04-30 UTC--1988-10-18 UTC FALSE        
 9 1988-03-29 UTC--1990-06-04 UTC FALSE        
10 1994-02-04 UTC--1996-02-01 UTC FALSE        
11 1999-06-30 UTC--2001-05-16 UTC FALSE        
12 2004-06-30 UTC--2007-06-29 UTC FALSE        
13 2016-12-14 UTC--2018-06-14 UTC FALSE        
14 2017-12-13 UTC--2019-12-19 UTC FALSE        
15 2022-03-16 UTC--2023-11-26 UTC FALSE

Am I missing something obvious here?

Thanks!

CodePudding user response:

Use lubridate::int_overlaps() instead of %within%:

library(lubridate)
library(tibble)
library(purrr)

tibble(
  interval_A,
  in_interval_B = map_lgl(interval_A, ~ any(int_overlaps(.x, interval_B)))
)
#> # A tibble: 15 × 2
#>    interval_A                     in_interval_B
#>    <Interval>                     <lgl>        
#>  1 1973-01-04 UTC--1973-09-02 UTC FALSE        
#>  2 1974-03-01 UTC--1974-06-30 UTC TRUE         
#>  3 1977-08-15 UTC--1978-12-20 UTC FALSE        
#>  4 1979-04-30 UTC--1979-12-02 UTC FALSE        
#>  5 1980-08-07 UTC--1980-12-28 UTC FALSE        
#>  6 1984-03-27 UTC--1984-10-01 UTC FALSE        
#>  7 1985-08-20 UTC--1985-12-16 UTC FALSE        
#>  8 1987-04-30 UTC--1987-10-18 UTC FALSE        
#>  9 1988-03-29 UTC--1989-06-04 UTC FALSE        
#> 10 1994-02-04 UTC--1995-02-01 UTC FALSE        
#> 11 1999-06-30 UTC--2000-05-16 UTC FALSE        
#> 12 2004-06-30 UTC--2006-06-29 UTC FALSE        
#> 13 2016-12-14 UTC--2017-06-14 UTC FALSE        
#> 14 2017-12-13 UTC--2018-12-19 UTC FALSE        
#> 15 2022-03-16 UTC--2022-11-19 UTC FALSE

tibble(
  interval_A_plus12,
  in_interval_B = map_lgl(interval_A_plus12, ~ any(int_overlaps(.x, interval_B)))
)
#> # A tibble: 15 × 2
#>    interval_A_plus12              in_interval_B
#>    <Interval>                     <lgl>        
#>  1 1973-01-04 UTC--1974-09-02 UTC TRUE         
#>  2 1974-03-01 UTC--1975-06-30 UTC TRUE         
#>  3 1977-08-15 UTC--1979-12-20 UTC FALSE        
#>  4 1979-04-30 UTC--1980-12-02 UTC TRUE         
#>  5 1980-08-07 UTC--1981-12-28 UTC TRUE         
#>  6 1984-03-27 UTC--1985-10-01 UTC FALSE        
#>  7 1985-08-20 UTC--1986-12-16 UTC FALSE        
#>  8 1987-04-30 UTC--1988-10-18 UTC FALSE        
#>  9 1988-03-29 UTC--1990-06-04 UTC FALSE        
#> 10 1994-02-04 UTC--1996-02-01 UTC FALSE        
#> 11 1999-06-30 UTC--2001-05-16 UTC TRUE         
#> 12 2004-06-30 UTC--2007-06-29 UTC FALSE        
#> 13 2016-12-14 UTC--2018-06-14 UTC FALSE        
#> 14 2017-12-13 UTC--2019-12-19 UTC FALSE        
#> 15 2022-03-16 UTC--2023-11-26 UTC FALSE

Created on 2022-11-26 with reprex v2.0.2

CodePudding user response:

Perhaps

library(purrr)
library(lubridate)
map2_lgl(int_start(interval_A_plus12), int_end(interval_A_plus12),
    ~ any(.x %within% interval_B)| any(.y %within% interval_B))

-output

[1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Or

rowSums(outer(int_start(interval_A_plus12), interval_B, FUN = `%within%`) |outer(int_end(interval_A_plus12), interval_B, FUN = `%within%`)) > 0
  • Related