Home > Mobile >  R functions to divide person-time by year of observation
R functions to divide person-time by year of observation

Time:12-23

Been a little stuck on this for a couple days. Let's say I have a cohort of 2 people.

Person 1 was in cohort from 01/01/2000 to 01/03/2001. Person 2 was in cohort from 01/01/1999 to 31/12/2001.

This means person 1 was in the cohort for all of 2000 and 25% of 2001. Person 2 was in the cohort for all of 1999, all of 2000, and all of 2001.

Adding this together means that, in total, the cohort contributed 1 year of person-time in 1999, 2 years of person-time in 2000, and 1.25 years of person-time in 2001.

Does anyone know of any R functions that might help with dividing up/summing time elapsed between dates like this? I could write it all from scratch, but I'd like to use existing functions if they're out there, and Google has got me nowhere.

Thanks!

CodePudding user response:

Using data.table and lubridate:

Data <- Data[, .(Start, Start2 = seq(Start, End, by="year"), End), by=.(Person)]
Data[, End2 := Start2 years(1)-days(1)]
Data[year(Start2) != year(Start), Start := Start2]
Data[year(End2) != year(End), End := End2]
Data[, c("Year", "Contribution") := list(year(Start), (month(End)-month(Start) 1)/12)]
Data <- Data[, .(Contribution = sum(Contribution)), by=.(Year)][order(Year)]

Which gives:

> Data
   Year Contribution
1: 1999         1.00
2: 2000         2.00
3: 2001         1.25

CodePudding user response:

This is a possible generalized tidyverse approach also using lubridate. This creates rows for each year and appropriate time intervals for each person-year. The intersection between the calendar year and person-year interval will be the contribution summed up in the end. Note that Jan 1 to Mar 1 here would be considered 2 months or 1/6 of a year contribution (not 25%).

df <- data.frame(
  person = c("Person 1", "Person 2"),
  start = c("01/01/2000", "01/01/1999"),
  end = c("01/03/2001", "31/12/2001")
)

df$start <- dmy(df$start)
df$end <- dmy(df$end)

library(lubridate)
library(tidyverse)

df %>%
  mutate(date_int = interval(start, end),
         year = map2(year(start), year(end), seq)) %>%
  unnest(year) %>%
  mutate(
    year_int = interval(
      as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))
      ),
    year_sect = intersect(date_int, year_int)
  ) %>%
  group_by(year) %>%
  summarise(contribute = signif(sum(as.numeric(year_sect, "years")), 2))

Output

   year contribute
  <int>      <dbl>
1  1999        1  
2  2000        2  
3  2001        1.2
  • Related