Data
I'm working with the following data from 2019 to 2022
library(tidyverse)
library(lubridate)
set.seed(123)
df <-
tibble(
date = sample(
seq(as.Date('2019/01/01'), as.Date('2022/09/01'), by = "day"),
size = 100,
replace = T))
Desired output
I would like to create a new variable that summarises each week into the same month-day range across 2019-2022. The result should be the week staring the same month-day regardless of what weekday or week of the year that date happens to be each year.
For example
- 2019-07-03 should become "July 01" (Short for "Week starting July 01)
- 2021-07-03 should become "July 01" (New year, same month-day)
- 2021-07-04 should become "July 01"
- 2021-07-05 should become "July 01"
- 2021-07-06 should become "July 01"
- 2021-07-07 should become "July 01"
- 2021-07-08 should become "July 08" (New week has started, increment the month-day by 7)
- 2021-07-14 should become "July 08"
- 2021-07-15 should become "July 15" (New week has started, increment the month-day by 7)
- 2021-07-16 should become "July 15"
- 2022-07-15 should become "July 15" (New year, same month-day)
- Etc.
Failed attempt
I've tried using cat.Date()
and lubridate::floor_date()
but these return the floor date for that particular year. For example
df %>%
mutate(
date_month_week = lubridate::floor_date(date, unit = "week", week_start = 7),
date_month_week = format(as.Date(date_month_week), "%m-%d")) %>%
arrange(desc(date_month_week)) %>%
distinct(date_month_week)
Yields
# A tibble: 77 × 1
date_month_week
<chr>
1 12-20
2 12-15
3 12-12
4 12-08
5 12-06
6 12-01
7 11-28
8 11-24
9 11-22
10 11-21
Many thanks for your help.
CodePudding user response:
You could group_by week numbers format = "%V"
. The thing is that weeks starts at different days depending on the year, so you could summarise by using the first day of a week for a given year instead of week numbers (but you'll have to group by week numbers):
df %>%
mutate(date_month_week = floor_date(date, unit = "week", week_start = 7),
week_number = format(date_month_week, "%V"),
string = format(date_month_week, "Week of %b %d")) %>%
group_by(week_number) %>%
summarise(string = min(string),
date_month_week = toString(date_month_week)) %>%
ungroup() %>%
select(-week_number)
# A tibble: 46 × 2
string date_month_week
<chr> <chr>
1 Week of Jan 05 2020-01-05, 2021-01-10, 2020-01-05, 2022-01-09
2 Week of Jan 13 2022-01-16, 2021-01-17, 2019-01-13, 2022-01-16
3 Week of Jan 20 2019-01-20, 2021-01-24
4 Week of Feb 03 2021-02-07, 2019-02-03, 2022-02-06
5 Week of Feb 09 2022-02-13, 2019-02-10, 2022-02-13, 2020-02-09
6 Week of Feb 16 2020-02-16
7 Week of Feb 27 2022-02-27
8 Week of Mar 01 2022-03-06, 2020-03-01, 2019-03-03
9 Week of Mar 10 2019-03-10, 2019-03-10
10 Week of Mar 27 2022-03-27, 2022-03-27
# … with 36 more rows
CodePudding user response:
I've developed a kludge solution, but would love hear a more elegant alternative.
We can create a separate df that matches every month-day to a corresponding "Week of %b-%d"
ls_groups <-
tibble(
date = seq(
as.Date('2018/01/01'), # 2018 starts on a Monday
as.Date('2018/12/31'), # Dec 31 is week 53, need to fix later
by = "day"),
month_day = format(date, "%Y-%m-%d"),
group = sort(c(rep(seq(1,52), 7), 53))) %>% # Add week 53 manually
group_by(group) %>%
mutate(
group = min(month_day), # Take the minimum date so it's always starting on Monday
group = format(as.Date(group), "Week start %b-%d"), # Convert to text format
month_day = format(date, "%m-%d")) %>% # Create a join key
select(-date)
Then we join it into the original data frame
df %>%
mutate(month_day = format(date, "%m-%d")) %>% # Create a join key
left_join(
x = .,
y = ls_groups,
by = "month_day") %>%
select(-month_day) %>%
arrange(group) # Arrange to check that diff. years have same month-day
Which gives us
# A tibble: 100 × 2
date group
<date> <chr>
1 2020-04-07 Week start Apr-02
2 2021-04-19 Week start Apr-16
3 2022-04-16 Week start Apr-16
4 2021-04-23 Week start Apr-23
5 2020-04-24 Week start Apr-23
6 2020-05-04 Week start Apr-30
7 2019-05-01 Week start Apr-30
8 2021-08-10 Week start Aug-06
9 2022-08-06 Week start Aug-06
10 2020-08-10 Week start Aug-06
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows