Home > Software engineering >  calculate time difference in weeks
calculate time difference in weeks

Time:10-19

I have a dataset like so

data <- data.table::data.table(id = seq(1:5),
                               t0_year = c(2000,2001,2002,2001,2000),
                               t0_week = c(4,32,44,15,11),
                               t1_date = as.Date(c("2010-07-31", "2010-12-31", "2010-02-08", "2005-07-05", "2000-07-20")))

I would like to calculate the time difference in weeks between t1 and t0, but for t0, I only have the year and week number, while for t1, I have the exact date.

I have tried different things with lubridate, but without much luck. Does anyone have a solution?

CodePudding user response:

One cannot calculate the difference in a fixed unit between a date and a week, they are different. The best one might do is determine a range, where the difference might be a number plus-or-minus-3.5 to indicate that you don't know which of the values is correct. If your "week" is assumed to start on a Sunday or a Monday or whichever day you choose, then ... change t0 to a Date (permanently, as a new column, or on-the-fly) and do a real difference calculation using - or difftime.

As a coarse example, what is (precisely) the difference between 2021-10-19 and 2021-Jan? I've generalized this to be a month and not a week, but the point is the same. The value is somewhere between 261 and 291 (inclusive), and given what we know it could be any of those with equal probability. If we assume that somebody is most likely to want the difference from the first or last day of said month, then it is one of 261 or 291, but to know which we need to know intent and/or context of the original question.

For the discussion of "how to convert year-month into a Date object", see Transform year/week to date object.

Here are two options, depending on Sunday- or Monday-based weeks. (I chose those two days because those appear to be the most frequent week-start-days; however, whichever you use is up to you and the context or your data, I make no assertion that these are better than others.)

library(data.table)
data[, t0_sun := as.Date(sprintf("%4ii7", t0_year, t0_week), format = "%Y%U%u")
  ][, t0_mon := as.Date(sprintf("%4ii1", t0_year, t0_week), format = "%Y%U%u")
  ][, d := as.numeric(t1_date - t0_sun, "weeks")]
#       id t0_year t0_week    t1_date     t0_mon     t0_sun         d
#    <int>   <num>   <num>     <Date>     <Date>     <Date>     <num>
# 1:     1    2000       4 2010-07-31 2000-01-24 2000-01-23 548.85714
# 2:     2    2001      32 2010-12-31 2001-08-13 2001-08-12 489.71429
# 3:     3    2002      44 2010-02-08 2002-11-04 2002-11-03 379.14286
# 4:     4    2001      15 2005-07-05 2001-04-16 2001-04-15 220.28571
# 5:     5    2000      11 2000-07-20 2000-03-13 2000-03-12  18.57143

Note that from ?strptime (and the linked question/answer about year-week to Date), "%U" is using a US-based standard; one might instead choose "%W" for UK-based weeks.

data[, t0_sun := as.Date(sprintf("%4ii0", t0_year, t0_week), format = "%Y%W%w")
  ][, t0_mon := as.Date(sprintf("%4ii1", t0_year, t0_week), format = "%Y%W%2")
  ][, d := as.numeric(t1_date - t0_sun, "weeks")]

CodePudding user response:

Does this work:

library(dplyr)
library(lubridate)

data %>% mutate(t0_date = as.Date(paste(t0_year, round(day(weeks(t0_week))/30), '01', sep = '-'), format = '%Y-%m-%d'), diff_weeks = difftime(t1_date,t0_date, units = 'weeks'))
   id t0_year t0_week    t1_date    t0_date      diff_weeks
1:  1    2000       4 2010-07-31 2000-01-01 552.00000 weeks
2:  2    2001      32 2010-12-31 2001-07-01 495.71429 weeks
3:  3    2002      44 2010-02-08 2002-10-01 383.85714 weeks
4:  4    2001      15 2005-07-05 2001-04-01 222.28571 weeks
5:  5    2000      11 2000-07-20 2000-03-01  20.14286 weeks
  • Related