Home > Enterprise >  Determine if the day of a month is in a date range, independent from its year
Determine if the day of a month is in a date range, independent from its year

Time:07-22

Given I have time ranges with a start and an end date, I can easily determine if a specific date falls in this time range. How can we determine if a specific month/day combination lies in a time range, independent from its year.

Example

Given I would like to know whether any first of July (07-01) lies in a time range.

2020-01-30 - 2020-06-15  --> NO
2020-06-16 - 2021-03-20  --> YES
2013-04-26 - 2019-02-13  --> YES (multiple)

R Code Example

# set seed for sampling
set.seed(1)

# number of time ranges
cases <- 10

# time gaps in days
gaps <- sort(sample(x = 1:5000, size = cases, replace = TRUE))

# data frame with time ranges
df <- data.frame(dates_start = rev(Sys.Date() - gaps[2:cases]   1),
                 dates_end   = rev(Sys.Date() - gaps[1:(cases-1)]))
df
#>   dates_start  dates_end
#> 1  2009-06-26 2010-01-19
#> 2  2010-01-20 2011-06-05
#> 3  2011-06-06 2011-06-20
#> 4  2011-06-21 2013-04-21
#> 5  2013-04-22 2016-02-17
#> 6  2016-02-18 2016-08-05
#> 7  2016-08-06 2018-05-11
#> 8  2018-05-12 2019-10-09
#> 9  2019-10-10 2021-10-25

# Is specific date in date range
df$date_in_range <- df$dates_start <= lubridate::ymd("2019-07-01") & 
                                      lubridate::ymd("2019-07-01") < df$dates_end

# specific day of a month in date range
# pseudo code
data.table::between(x = month_day("07-01"),
                    lower = dates_start,
                    upper = dates_end)
#> Error in month_day("07-01"): could not find function "month_day"

# expected output
df$monthday_in_range <- c(T, T, F, T, T, T, T, T, T)
df
#>   dates_start  dates_end date_in_range monthday_in_range
#> 1  2009-06-26 2010-01-19         FALSE              TRUE
#> 2  2010-01-20 2011-06-05         FALSE              TRUE
#> 3  2011-06-06 2011-06-20         FALSE             FALSE
#> 4  2011-06-21 2013-04-21         FALSE              TRUE
#> 5  2013-04-22 2016-02-17         FALSE              TRUE
#> 6  2016-02-18 2016-08-05         FALSE              TRUE
#> 7  2016-08-06 2018-05-11         FALSE              TRUE
#> 8  2018-05-12 2019-10-09          TRUE              TRUE
#> 9  2019-10-10 2021-10-25         FALSE              TRUE

CodePudding user response:

Update 2

dplyr/data.table independent function

md_in_interval <- function(md, start, end) {
  # does the interval cover more than a full year? 
  # Then any date will fall in this interval and hence the result is TRUE
  helper <- (lubridate::year(end) - lubridate::year(start)) > 1
  
  # lubridate time interval
  interval <- lubridate::interval(dates_start, dates_end)
  
  # helper dates with month/day combination and start year
  my_date1 <- lubridate::mdy(paste0(md, lubridate::year(start)))
  # helper dates with month/day combination and end year
  my_date2 <- lubridate::mdy(paste0(md, lubridate::year(end)))
  
  # check if month/day combination falls within the interval
  out <- my_date1 %within% interval | 
    my_date2 %within% interval | 
    helper 
  
  return(out)
  
}

Usage with data.table

library(data.table)
dt <- data.table::as.data.table(df)
dt[, isin := myfun("06-05", dates_start, dates_end)][]

Update

To overcome the issue with when there are more than one year span we could use a helper column:

df %>% 
  mutate(across(, ymd),
         helper = ifelse(year(dates_end) - year(dates_start) > 1, 1, 0),
         interval = interval(dates_start, dates_end)) %>% 
  mutate(my_date1 = mdy(paste0("07-01-",year(dates_start))),
         my_date2 = mdy(paste0("07-01-",year(dates_end)))) %>% 
  mutate(check = my_date1 %within% interval | my_date2 %within% interval | helper == 1) %>% 
  select(1,2,7)
  dates_start  dates_end check
1  2009-06-26 2010-01-19  TRUE
2  2010-01-20 2011-06-05  TRUE
3  2011-06-06 2011-06-20 FALSE
4  2011-06-21 2013-04-21  TRUE
5  2013-04-22 2016-02-17  TRUE
6  2016-02-18 2016-08-05  TRUE
7  2016-08-06 2018-05-11  TRUE
8  2018-05-12 2019-10-09  TRUE
9  2019-10-10 2021-10-25  TRUE

First answer:

We could use lubridate for this.

  1. We create an interval with interval then we

  2. we check with %within% wether the day is in interval or not.

  3. Before we have to create a month-day-year of 07-01 element. We do this with mdy(paste0("07-01-",year(dates_start)))

library(dplyr)
library(lubridate)

df %>% 
  mutate(across(, ymd),
         interval = interval(dates_start, dates_end)) %>% 
  mutate(my_date = mdy(paste0("07-01-",year(dates_start)))) %>% 
  mutate(check = my_date %within% interval)
  dates_start  dates_end                       interval    my_date check
1  2009-06-26 2010-01-19 2009-06-26 UTC--2010-01-19 UTC 2009-07-01  TRUE
2  2010-01-20 2011-06-05 2010-01-20 UTC--2011-06-05 UTC 2010-07-01  TRUE
3  2011-06-06 2011-06-20 2011-06-06 UTC--2011-06-20 UTC 2011-07-01 FALSE
4  2011-06-21 2013-04-21 2011-06-21 UTC--2013-04-21 UTC 2011-07-01  TRUE
5  2013-04-22 2016-02-17 2013-04-22 UTC--2016-02-17 UTC 2013-07-01  TRUE
6  2016-02-18 2016-08-05 2016-02-18 UTC--2016-08-05 UTC 2016-07-01  TRUE
7  2016-08-06 2018-05-11 2016-08-06 UTC--2018-05-11 UTC 2016-07-01 FALSE
8  2018-05-12 2019-10-09 2018-05-12 UTC--2019-10-09 UTC 2018-07-01  TRUE
9  2019-10-10 2021-10-25 2019-10-10 UTC--2021-10-25 UTC 2019-07-01 FALSE

CodePudding user response:

You may try

library(lubridate)
library(dplyr)
df %>%
  rowwise %>%
  mutate(monthday_in_range = 7 %in% month(seq(floor_date(dates_start, "month"), dates_end, by = "month")))

  dates_start dates_end  monthday_in_range
  <date>      <date>     <lgl>            
1 2009-06-26  2010-01-19 TRUE             
2 2010-01-20  2011-06-05 TRUE             
3 2011-06-06  2011-06-20 FALSE            
4 2011-06-21  2013-04-21 TRUE             
5 2013-04-22  2016-02-17 TRUE             
6 2016-02-18  2016-08-05 TRUE             
7 2016-08-06  2018-05-11 TRUE             
8 2018-05-12  2019-10-09 TRUE             
9 2019-10-10  2021-10-25 TRUE 

add

df %>%
  rowwise %>%
  mutate(monthday_in_range = 7 %in% month(seq(ymd(paste0(substr(dates_start, 1, 8), "13")), dates_end, by = "month")))
  • Related