Home > other >  How to create a time-varying date interval variable bounded by a single start and end date
How to create a time-varying date interval variable bounded by a single start and end date

Time:10-15

I have a longitudinal data set in which each person has a time-fixed 'start' and 'end' date of observation. It looks like this, for example:

# Populate data frame 
person <- c(1, 1, 1, 1, 1)
row <- c(1, 2, 3, 4, 5)
start <- c('2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01')
end <- c('2015-12-31', '2015-12-31', '2015-12-31', '2015-12-31', '2015-12-31')

# Bind columns together into a data frame
df <- as.data.frame(cbind(person, row, start, end))

# start and end are date variables
df$start <- as.Date(df$start)
df$end <- as.Date(df$end)
df
#>   person row      start        end
#> 1      1   1 2011-01-01 2015-12-31
#> 2      1   2 2011-01-01 2015-12-31
#> 3      1   3 2011-01-01 2015-12-31
#> 4      1   4 2011-01-01 2015-12-31
#> 5      1   5 2011-01-01 2015-12-31

I want to create a time-varying interval that calculates 6-month intervals that span the duration of those start-end bounds. Question: How can you create this time-varying interval as a set of variables? I'm looking for a solution that doesn't use data.table. It would look like this:

# Populate data frame 
person <- c(1, 1, 1, 1, 1)
row <- c(1, 2, 3, 4, 5)
start <- c('2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01')
end <- c('2014-12-31', '2014-12-31', '2014-12-31', '2014-12-31', '2014-12-31')
IntervalStart <- c('2011-01-01', '2011-07-02', '2012-07-03', '2013-07-04', '2014-07-05')
IntervalEnd <- c('2011-07-01', '2012-01-02', '2013-01-03', '2014-01-04', '2015-01-05')

# Bind columns together into a data frame
df <- as.data.frame(cbind(person, row, start, end, IntervalStart, IntervalEnd))

# format date variables
df$start <- as.Date(df$start)
df$end <- as.Date(df$end)
df$IntervalStart <- as.Date(df$IntervalStart)
df$IntervalEnd <- as.Date(df$IntervalEnd)

df
#>   person row      start        end IntervalStart IntervalEnd
#> 1      1   1 2011-01-01 2014-12-31    2011-01-01  2011-07-01
#> 2      1   2 2011-01-01 2014-12-31    2011-07-02  2012-01-02
#> 3      1   3 2011-01-01 2014-12-31    2012-07-03  2013-01-03
#> 4      1   4 2011-01-01 2014-12-31    2013-07-04  2014-01-04
#> 5      1   5 2011-01-01 2014-12-31    2014-07-05  2015-01-05

Created on 2022-10-13 by the reprex package (v2.0.1)

CodePudding user response:

  1. Create a nested list of start dates at six month intervals
  2. Unnest the list so that each start date has its own row
  3. Create the interval end date which is the day before the following start date; if the date in question is the last in its sequence, take the end date to be the overall end date
library(tidyverse)

df %>% 
  mutate(IntervalStart = map2(start, end, seq, by = '6 months')) %>%
  unnest(IntervalStart) %>% 
  group_by(row) %>%
  mutate(IntervalEnd = case_when(is.na(lead(IntervalStart)) ~ end,
                                 TRUE ~ lead(IntervalStart) - 1))
#> # A tibble: 50 x 6
#> # Groups:   row [5]
#>    person row   start      end        IntervalStart IntervalEnd
#>    <fct>  <fct> <date>     <date>     <date>        <date>     
#>  1 1      1     2011-01-01 2015-12-31 2011-01-01    2011-06-30 
#>  2 1      1     2011-01-01 2015-12-31 2011-07-01    2011-12-31 
#>  3 1      1     2011-01-01 2015-12-31 2012-01-01    2012-06-30 
#>  4 1      1     2011-01-01 2015-12-31 2012-07-01    2012-12-31 
#>  5 1      1     2011-01-01 2015-12-31 2013-01-01    2013-06-30 
#>  6 1      1     2011-01-01 2015-12-31 2013-07-01    2013-12-31 
#>  7 1      1     2011-01-01 2015-12-31 2014-01-01    2014-06-30 
#>  8 1      1     2011-01-01 2015-12-31 2014-07-01    2014-12-31 
#>  9 1      1     2011-01-01 2015-12-31 2015-01-01    2015-06-30 
#> 10 1      1     2011-01-01 2015-12-31 2015-07-01    2015-12-31 
#> # ... with 40 more rows
  • Related