I have a longitudinal data set in which each person has a time-fixed 'start' and 'end' date of observation. It looks like this, for example:
# Populate data frame
person <- c(1, 1, 1, 1, 1)
row <- c(1, 2, 3, 4, 5)
start <- c('2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01')
end <- c('2015-12-31', '2015-12-31', '2015-12-31', '2015-12-31', '2015-12-31')
# Bind columns together into a data frame
df <- as.data.frame(cbind(person, row, start, end))
# start and end are date variables
df$start <- as.Date(df$start)
df$end <- as.Date(df$end)
df
#> person row start end
#> 1 1 1 2011-01-01 2015-12-31
#> 2 1 2 2011-01-01 2015-12-31
#> 3 1 3 2011-01-01 2015-12-31
#> 4 1 4 2011-01-01 2015-12-31
#> 5 1 5 2011-01-01 2015-12-31
I want to create a time-varying interval that calculates 6-month intervals that span the duration of those start-end bounds. Question: How can you create this time-varying interval as a set of variables? I'm looking for a solution that doesn't use data.table. It would look like this:
# Populate data frame
person <- c(1, 1, 1, 1, 1)
row <- c(1, 2, 3, 4, 5)
start <- c('2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01')
end <- c('2014-12-31', '2014-12-31', '2014-12-31', '2014-12-31', '2014-12-31')
IntervalStart <- c('2011-01-01', '2011-07-02', '2012-07-03', '2013-07-04', '2014-07-05')
IntervalEnd <- c('2011-07-01', '2012-01-02', '2013-01-03', '2014-01-04', '2015-01-05')
# Bind columns together into a data frame
df <- as.data.frame(cbind(person, row, start, end, IntervalStart, IntervalEnd))
# format date variables
df$start <- as.Date(df$start)
df$end <- as.Date(df$end)
df$IntervalStart <- as.Date(df$IntervalStart)
df$IntervalEnd <- as.Date(df$IntervalEnd)
df
#> person row start end IntervalStart IntervalEnd
#> 1 1 1 2011-01-01 2014-12-31 2011-01-01 2011-07-01
#> 2 1 2 2011-01-01 2014-12-31 2011-07-02 2012-01-02
#> 3 1 3 2011-01-01 2014-12-31 2012-07-03 2013-01-03
#> 4 1 4 2011-01-01 2014-12-31 2013-07-04 2014-01-04
#> 5 1 5 2011-01-01 2014-12-31 2014-07-05 2015-01-05
Created on 2022-10-13 by the reprex package (v2.0.1)
CodePudding user response:
- Create a nested list of start dates at six month intervals
- Unnest the list so that each start date has its own row
- Create the interval end date which is the day before the following start date; if the date in question is the last in its sequence, take the end date to be the overall end date
library(tidyverse)
df %>%
mutate(IntervalStart = map2(start, end, seq, by = '6 months')) %>%
unnest(IntervalStart) %>%
group_by(row) %>%
mutate(IntervalEnd = case_when(is.na(lead(IntervalStart)) ~ end,
TRUE ~ lead(IntervalStart) - 1))
#> # A tibble: 50 x 6
#> # Groups: row [5]
#> person row start end IntervalStart IntervalEnd
#> <fct> <fct> <date> <date> <date> <date>
#> 1 1 1 2011-01-01 2015-12-31 2011-01-01 2011-06-30
#> 2 1 1 2011-01-01 2015-12-31 2011-07-01 2011-12-31
#> 3 1 1 2011-01-01 2015-12-31 2012-01-01 2012-06-30
#> 4 1 1 2011-01-01 2015-12-31 2012-07-01 2012-12-31
#> 5 1 1 2011-01-01 2015-12-31 2013-01-01 2013-06-30
#> 6 1 1 2011-01-01 2015-12-31 2013-07-01 2013-12-31
#> 7 1 1 2011-01-01 2015-12-31 2014-01-01 2014-06-30
#> 8 1 1 2011-01-01 2015-12-31 2014-07-01 2014-12-31
#> 9 1 1 2011-01-01 2015-12-31 2015-01-01 2015-06-30
#> 10 1 1 2011-01-01 2015-12-31 2015-07-01 2015-12-31
#> # ... with 40 more rows