I have a dataset like so
data <- data.table::data.table(id = seq(1:5),
t0_year = c(2000,2001,2002,2001,2000),
t0_week = c(4,32,44,15,11),
t1_date = as.Date(c("2010-07-31", "2010-12-31", "2010-02-08", "2005-07-05", "2000-07-20")))
I would like to calculate the time difference in weeks between t1 and t0, but for t0, I only have the year and week number, while for t1, I have the exact date.
I have tried different things with lubridate
, but without much luck. Does anyone have a solution?
CodePudding user response:
One cannot calculate the difference in a fixed unit between a date and a week, they are different. The best one might do is determine a range, where the difference might be a number plus-or-minus-3.5 to indicate that you don't know which of the values is correct. If your "week" is assumed to start on a Sunday or a Monday or whichever day you choose, then ... change t0
to a Date
(permanently, as a new column, or on-the-fly) and do a real difference calculation using -
or difftime
.
As a coarse example, what is (precisely) the difference between 2021-10-19
and 2021-Jan
? I've generalized this to be a month and not a week, but the point is the same. The value is somewhere between 261 and 291 (inclusive), and given what we know it could be any of those with equal probability. If we assume that somebody is most likely to want the difference from the first or last day of said month, then it is one of 261 or 291, but to know which we need to know intent and/or context of the original question.
For the discussion of "how to convert year-month into a Date
object", see Transform year/week to date object.
Here are two options, depending on Sunday- or Monday-based weeks. (I chose those two days because those appear to be the most frequent week-start-days; however, whichever you use is up to you and the context or your data, I make no assertion that these are better than others.)
library(data.table)
data[, t0_sun := as.Date(sprintf("%4ii7", t0_year, t0_week), format = "%Y%U%u")
][, t0_mon := as.Date(sprintf("%4ii1", t0_year, t0_week), format = "%Y%U%u")
][, d := as.numeric(t1_date - t0_sun, "weeks")]
# id t0_year t0_week t1_date t0_mon t0_sun d
# <int> <num> <num> <Date> <Date> <Date> <num>
# 1: 1 2000 4 2010-07-31 2000-01-24 2000-01-23 548.85714
# 2: 2 2001 32 2010-12-31 2001-08-13 2001-08-12 489.71429
# 3: 3 2002 44 2010-02-08 2002-11-04 2002-11-03 379.14286
# 4: 4 2001 15 2005-07-05 2001-04-16 2001-04-15 220.28571
# 5: 5 2000 11 2000-07-20 2000-03-13 2000-03-12 18.57143
Note that from ?strptime
(and the linked question/answer about year-week to Date
), "%U"
is using a US-based standard; one might instead choose "%W"
for UK-based weeks.
data[, t0_sun := as.Date(sprintf("%4ii0", t0_year, t0_week), format = "%Y%W%w")
][, t0_mon := as.Date(sprintf("%4ii1", t0_year, t0_week), format = "%Y%W%2")
][, d := as.numeric(t1_date - t0_sun, "weeks")]
CodePudding user response:
Does this work:
library(dplyr)
library(lubridate)
data %>% mutate(t0_date = as.Date(paste(t0_year, round(day(weeks(t0_week))/30), '01', sep = '-'), format = '%Y-%m-%d'), diff_weeks = difftime(t1_date,t0_date, units = 'weeks'))
id t0_year t0_week t1_date t0_date diff_weeks
1: 1 2000 4 2010-07-31 2000-01-01 552.00000 weeks
2: 2 2001 32 2010-12-31 2001-07-01 495.71429 weeks
3: 3 2002 44 2010-02-08 2002-10-01 383.85714 weeks
4: 4 2001 15 2005-07-05 2001-04-01 222.28571 weeks
5: 5 2000 11 2000-07-20 2000-03-01 20.14286 weeks