Been a little stuck on this for a couple days. Let's say I have a cohort of 2 people.
Person 1 was in cohort from 01/01/2000 to 01/03/2001. Person 2 was in cohort from 01/01/1999 to 31/12/2001.
This means person 1 was in the cohort for all of 2000 and 25% of 2001. Person 2 was in the cohort for all of 1999, all of 2000, and all of 2001.
Adding this together means that, in total, the cohort contributed 1 year of person-time in 1999, 2 years of person-time in 2000, and 1.25 years of person-time in 2001.
Does anyone know of any R functions that might help with dividing up/summing time elapsed between dates like this? I could write it all from scratch, but I'd like to use existing functions if they're out there, and Google has got me nowhere.
Thanks!
CodePudding user response:
Using data.table
and lubridate
:
Data <- Data[, .(Start, Start2 = seq(Start, End, by="year"), End), by=.(Person)]
Data[, End2 := Start2 years(1)-days(1)]
Data[year(Start2) != year(Start), Start := Start2]
Data[year(End2) != year(End), End := End2]
Data[, c("Year", "Contribution") := list(year(Start), (month(End)-month(Start) 1)/12)]
Data <- Data[, .(Contribution = sum(Contribution)), by=.(Year)][order(Year)]
Which gives:
> Data
Year Contribution
1: 1999 1.00
2: 2000 2.00
3: 2001 1.25
CodePudding user response:
This is a possible generalized tidyverse
approach also using lubridate
. This creates rows for each year and appropriate time intervals for each person-year. The intersection between the calendar year and person-year interval will be the contribution summed up in the end. Note that Jan 1 to Mar 1 here would be considered 2 months or 1/6 of a year contribution (not 25%).
df <- data.frame(
person = c("Person 1", "Person 2"),
start = c("01/01/2000", "01/01/1999"),
end = c("01/03/2001", "31/12/2001")
)
df$start <- dmy(df$start)
df$end <- dmy(df$end)
library(lubridate)
library(tidyverse)
df %>%
mutate(date_int = interval(start, end),
year = map2(year(start), year(end), seq)) %>%
unnest(year) %>%
mutate(
year_int = interval(
as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))
),
year_sect = intersect(date_int, year_int)
) %>%
group_by(year) %>%
summarise(contribute = signif(sum(as.numeric(year_sect, "years")), 2))
Output
year contribute
<int> <dbl>
1 1999 1
2 2000 2
3 2001 1.2