I have a dataframe that consist of the Month, Week of the Month and the Days of the Week (i.e Monday, Tuesday, etc)
I would like to convert them into Date Month format (e.g 02-Dec). How do I retrieve the date with just the week of the month and the days?
dput
structure(list(Month = c("Dec", "Jan", "Oct", "Jun", "Jan", "Oct",
"Feb", "Nov", "Dec", "Apr", "Mar", "Mar", "Jan", "Jan", "Jan",
"Aug", "Apr", "Jul", "May", "Apr"), WeekOfMonth = c(5, 3, 5,
2, 5, 4, 1, 1, 4, 3, 2, 5, 3, 5, 5, 4, 4, 5, 4, 4), DayOfWeek = c("Wednesday",
"Wednesday", "Friday", "Saturday", "Monday", "Friday", "Saturday",
"Friday", "Saturday", "Tuesday", "Sunday", "Monday", "Friday",
"Friday", "Monday", "Tuesday", "Thursday", "Sunday", "Thursday",
"Monday"), Year = c(1994, 1994, 1994, 1994, 1994, 1994, 1994,
1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994,
1994, 1994)), row.names = c(NA, -20L), class = c("tbl_df", "tbl",
"data.frame"))
CodePudding user response:
Try this:
dat |>
transform(
DayOfWeekNum = match(DayOfWeek, wdays),
date01 = as.Date(paste(Year, Month, "01", sep = "-"), format = "%Y-%b-%d")
) |>
transform(weekday01 = as.integer(format(date01, format = "%w")) 1L) |>
transform(realdate = date01 (WeekOfMonth - 1L) * 7L (DayOfWeekNum - weekday01))
# Month WeekOfMonth DayOfWeek Year DayOfWeekNum date01 weekday01 realdate
# 1 Dec 5 Wednesday 1994 4 1994-12-01 5 1994-12-28
# 2 Jan 3 Wednesday 1994 4 1994-01-01 7 1994-01-12
# 3 Oct 5 Friday 1994 6 1994-10-01 7 1994-10-28
# 4 Jun 2 Saturday 1994 7 1994-06-01 4 1994-06-11
# 5 Jan 5 Monday 1994 2 1994-01-01 7 1994-01-24
# 6 Oct 4 Friday 1994 6 1994-10-01 7 1994-10-21
# 7 Feb 1 Saturday 1994 7 1994-02-01 3 1994-02-05
# 8 Nov 1 Friday 1994 6 1994-11-01 3 1994-11-04
# 9 Dec 4 Saturday 1994 7 1994-12-01 5 1994-12-24
# 10 Apr 3 Tuesday 1994 3 1994-04-01 6 1994-04-12
# 11 Mar 2 Sunday 1994 1 1994-03-01 3 1994-03-06
# 12 Mar 5 Monday 1994 2 1994-03-01 3 1994-03-28
# 13 Jan 3 Friday 1994 6 1994-01-01 7 1994-01-14
# 14 Jan 5 Friday 1994 6 1994-01-01 7 1994-01-28
# 15 Jan 5 Monday 1994 2 1994-01-01 7 1994-01-24
# 16 Aug 4 Tuesday 1994 3 1994-08-01 2 1994-08-23
# 17 Apr 4 Thursday 1994 5 1994-04-01 6 1994-04-21
# 18 Jul 5 Sunday 1994 1 1994-07-01 6 1994-07-24
# 19 May 4 Thursday 1994 5 1994-05-01 1 1994-05-26
# 20 Apr 4 Monday 1994 2 1994-04-01 6 1994-04-18
where realdate
is (I believe) the date you're looking for.
(Since you have a tibble
, I'm inferring dplyr
, to which this can be translated trivially, a naive start would replace all transform(.)
to mutate(.)
, and then reduced to a single mutate
call.)
Verification
Using the calendar below (in linux, cal 1994
), I'll randomly verify rows 5, 14, and 17 (chosen by sort(sample(nrow(dat), size = 3))
).
- Row 5:
Jan, week 5, Monday, 1994
: the 5th week starts on Jan 23 (Sunday-based), that Monday is1994-01-24
, check. - Row 14:
Jan, week 5, Friday, 1994
: same start, Friday is1994-01-28
, check. - Row 17:
Apr, week 4, Thursday, 1994
: the fourth week starts on Apr 17, so Thursday is1994-04-21
, check.
I'll leave it as an exercise to verify the remaining rows.
Disclaimer: while I didn't find anything here, there may be room for off-by-one (seven, sire) errors if the locale used to determine WeekOfMonth
is different (i.e., not Sunday-based), in which case we might need to adjust ...
1994
January February March
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 1 2 3 4 5 1 2 3 4 5
2 3 4 5 6 7 8 6 7 8 9 10 11 12 6 7 8 9 10 11 12
9 10 11 12 13 14 15 13 14 15 16 17 18 19 13 14 15 16 17 18 19
16 17 18 19 20 21 22 20 21 22 23 24 25 26 20 21 22 23 24 25 26
23 24 25 26 27 28 29 27 28 27 28 29 30 31
30 31
April May June
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 1 2 3 4 5 6 7 1 2 3 4
3 4 5 6 7 8 9 8 9 10 11 12 13 14 5 6 7 8 9 10 11
10 11 12 13 14 15 16 15 16 17 18 19 20 21 12 13 14 15 16 17 18
17 18 19 20 21 22 23 22 23 24 25 26 27 28 19 20 21 22 23 24 25
24 25 26 27 28 29 30 29 30 31 26 27 28 29 30
July August September
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 1 2 3 4 5 6 1 2 3
3 4 5 6 7 8 9 7 8 9 10 11 12 13 4 5 6 7 8 9 10
10 11 12 13 14 15 16 14 15 16 17 18 19 20 11 12 13 14 15 16 17
17 18 19 20 21 22 23 21 22 23 24 25 26 27 18 19 20 21 22 23 24
24 25 26 27 28 29 30 28 29 30 31 25 26 27 28 29 30
31
October November December
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 1 2 3 4 5 1 2 3
2 3 4 5 6 7 8 6 7 8 9 10 11 12 4 5 6 7 8 9 10
9 10 11 12 13 14 15 13 14 15 16 17 18 19 11 12 13 14 15 16 17
16 17 18 19 20 21 22 20 21 22 23 24 25 26 18 19 20 21 22 23 24
23 24 25 26 27 28 29 27 28 29 30 25 26 27 28 29 30 31
30 31
CodePudding user response:
Some definitions:
- assuming year
"2022""1994" - counting weeks within months by start of the week day (if first day in a month is Sunday it counts as the first week)
- choosing "Monday" as start of the week, so this might be off by a day.
A simple validity check would be line 9:
- December - Week 4 - Saturday:
24-Dec
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
mutate(Year = 1994, start = ym(paste(Year, Month)),
end_date = start (days_in_month(start) - 1),
weekdays = list(format(seq(start, end_date, by="day"), "%A")),
week_seg = list(cumsum(c(T, (weekdays == "Monday")[2:length(weekdays)]))),
first = grep(WeekOfMonth, week_seg)[1],
act = grep(DayOfWeek, weekdays[week_seg == WeekOfMonth]),
date = format(ymd(paste(Year, Month, (first act) - 1)), "%d-%b")) %>%
ungroup() %>%
select(1:4, date)
# A tibble: 20 × 5
Month WeekOfMonth DayOfWeek Year date
<chr> <dbl> <chr> <dbl> <chr>
1 Dec 5 Wednesday 1994 28-Dec
2 Jan 3 Wednesday 1994 12-Jan
3 Oct 5 Friday 1994 28-Oct
4 Jun 2 Saturday 1994 11-Jun
5 Jan 5 Monday 1994 24-Jan
6 Oct 4 Friday 1994 21-Oct
7 Feb 1 Saturday 1994 05-Feb
8 Nov 1 Friday 1994 04-Nov
9 Dec 4 Saturday 1994 24-Dec
10 Apr 3 Tuesday 1994 12-Apr
11 Mar 2 Sunday 1994 13-Mar
12 Mar 5 Monday 1994 28-Mar
13 Jan 3 Friday 1994 14-Jan
14 Jan 5 Friday 1994 28-Jan
15 Jan 5 Monday 1994 24-Jan
16 Aug 4 Tuesday 1994 23-Aug
17 Apr 4 Thursday 1994 21-Apr
18 Jul 5 Sunday 1994 31-Jul
19 May 4 Thursday 1994 19-May
20 Apr 4 Monday 1994 18-Apr