Home > Net >  How to aggregate dates by "Week starting July 01" "Week starting July 08" etc ov
How to aggregate dates by "Week starting July 01" "Week starting July 08" etc ov

Time:09-03

Data

I'm working with the following data from 2019 to 2022

library(tidyverse)
library(lubridate)

set.seed(123)

df <- 
  tibble(
    date = sample(
      seq(as.Date('2019/01/01'), as.Date('2022/09/01'), by = "day"), 
      size = 100, 
      replace = T))

Desired output

I would like to create a new variable that summarises each week into the same month-day range across 2019-2022. The result should be the week staring the same month-day regardless of what weekday or week of the year that date happens to be each year.

For example

  • 2019-07-03 should become "July 01" (Short for "Week starting July 01)
  • 2021-07-03 should become "July 01" (New year, same month-day)
  • 2021-07-04 should become "July 01"
  • 2021-07-05 should become "July 01"
  • 2021-07-06 should become "July 01"
  • 2021-07-07 should become "July 01"
  • 2021-07-08 should become "July 08" (New week has started, increment the month-day by 7)
  • 2021-07-14 should become "July 08"
  • 2021-07-15 should become "July 15" (New week has started, increment the month-day by 7)
  • 2021-07-16 should become "July 15"
  • 2022-07-15 should become "July 15" (New year, same month-day)
  • Etc.

Failed attempt

I've tried using cat.Date() and lubridate::floor_date() but these return the floor date for that particular year. For example

df %>% 
  mutate(
    date_month_week = lubridate::floor_date(date, unit = "week", week_start = 7),
    date_month_week = format(as.Date(date_month_week), "%m-%d")) %>% 
  arrange(desc(date_month_week)) %>% 
  distinct(date_month_week)

Yields

# A tibble: 77 × 1
   date_month_week
   <chr>          
 1 12-20          
 2 12-15          
 3 12-12          
 4 12-08          
 5 12-06          
 6 12-01          
 7 11-28          
 8 11-24          
 9 11-22          
10 11-21 

Many thanks for your help.

CodePudding user response:

You could group_by week numbers format = "%V". The thing is that weeks starts at different days depending on the year, so you could summarise by using the first day of a week for a given year instead of week numbers (but you'll have to group by week numbers):

df %>% 
  mutate(date_month_week = floor_date(date, unit = "week", week_start = 7),
         week_number = format(date_month_week, "%V"),
         string = format(date_month_week, "Week of %b %d")) %>% 
  group_by(week_number) %>% 
  summarise(string = min(string),
            date_month_week = toString(date_month_week)) %>% 
  ungroup() %>% 
  select(-week_number)
# A tibble: 46 × 2
   string         date_month_week                               
   <chr>          <chr>                                         
 1 Week of Jan 05 2020-01-05, 2021-01-10, 2020-01-05, 2022-01-09
 2 Week of Jan 13 2022-01-16, 2021-01-17, 2019-01-13, 2022-01-16
 3 Week of Jan 20 2019-01-20, 2021-01-24                        
 4 Week of Feb 03 2021-02-07, 2019-02-03, 2022-02-06            
 5 Week of Feb 09 2022-02-13, 2019-02-10, 2022-02-13, 2020-02-09
 6 Week of Feb 16 2020-02-16                                    
 7 Week of Feb 27 2022-02-27                                    
 8 Week of Mar 01 2022-03-06, 2020-03-01, 2019-03-03            
 9 Week of Mar 10 2019-03-10, 2019-03-10                        
10 Week of Mar 27 2022-03-27, 2022-03-27                        
# … with 36 more rows

CodePudding user response:

I've developed a kludge solution, but would love hear a more elegant alternative.

We can create a separate df that matches every month-day to a corresponding "Week of %b-%d"

ls_groups <- 
  tibble(
    date = seq(
      as.Date('2018/01/01'), # 2018 starts on a Monday
      as.Date('2018/12/31'), # Dec 31 is week 53, need to fix later
      by = "day"),
    month_day = format(date, "%Y-%m-%d"), 
    group = sort(c(rep(seq(1,52), 7), 53))) %>% # Add week 53 manually
  group_by(group) %>% 
  mutate(
    group = min(month_day), # Take the minimum date so it's always starting on Monday
    group = format(as.Date(group), "Week start %b-%d"), # Convert to text format
    month_day = format(date, "%m-%d")) %>%   # Create a join key
  select(-date)

Then we join it into the original data frame

df %>% 
  mutate(month_day = format(date, "%m-%d")) %>% # Create a join key
  left_join(
    x = ., 
    y = ls_groups, 
    by = "month_day") %>% 
  select(-month_day) %>% 
  arrange(group) # Arrange to check that diff. years have same month-day

Which gives us

# A tibble: 100 × 2
   date       group            
   <date>     <chr>            
 1 2020-04-07 Week start Apr-02
 2 2021-04-19 Week start Apr-16
 3 2022-04-16 Week start Apr-16
 4 2021-04-23 Week start Apr-23
 5 2020-04-24 Week start Apr-23
 6 2020-05-04 Week start Apr-30
 7 2019-05-01 Week start Apr-30
 8 2021-08-10 Week start Aug-06
 9 2022-08-06 Week start Aug-06
10 2020-08-10 Week start Aug-06
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows
  • Related