How to determine difference in days between two dates across two columns and two rows by group?-CodePudding

I am looking to determine the difference in days by groups across two columns and two rows. Essentially subtract from the End Day by the subsequent Start Day in the subsequent row and record the difference as new column in the data frame and start over when a new group (ID) has been identified.

Start_Date   End_Date     ID   
  
2014-05-09   2015-05-08   01
2015-05-09   2016-05-08   01 
2016-05-11   2017-05-10   01
2017-05-11   2018-05-10   01
2016-08-29   2017-08-28   02
2017-08-29   2018-08-28   02

The result should be something like table below.

Start_Date   End_Date     ID   Days_Difference 
  
2014-05-09   2015-05-08   01         NA
2015-05-09   2016-05-08   01         01
2016-05-11   2017-05-10   01         03
2017-05-11   2018-05-10   01         01
2016-08-29   2017-08-28   02         NA
2017-08-29   2018-08-28   02         01

Essentially I want to take the difference of the End Date and its left diagonal Start date across groups (ID). I am having a really hard time with this one. I don't think my code would be helpful. Any solution using tidyverse, data.table, or base R would be greatly appreciated!

CodePudding user response：

We may get the difference between the lead (next element) of 'Start_Date' and 'End_Date' after grouping

library(dplyr)
df1 <- df1 %>%
   mutate(across(ends_with("Date"), as.Date)) %>%
   group_by(ID) %>% 
   mutate(Days_Difference = as.numeric(lag(lead(Start_Date) - End_Date))) %>% 
   ungroup

-output

df1
# A tibble: 6 × 4
  Start_Date End_Date      ID Days_Difference
  <date>     <date>     <int>           <dbl>
1 2014-05-09 2015-05-08     1              NA
2 2015-05-09 2016-05-08     1               1
3 2016-05-11 2017-05-10     1               3
4 2017-05-11 2018-05-10     1               1
5 2016-08-29 2017-08-28     2              NA
6 2017-08-29 2018-08-28     2               1

Or a similar logic with data.table

library(data.table)
setDT(df1)[, Days_Difference := 
    as.numeric(shift(shift(as.IDate(Start_Date), type = "lead") - 
       as.IDate(End_Date))), ID]

-output

> df1
   Start_Date   End_Date    ID Days_Difference
       <char>     <char> <int>           <num>
1: 2014-05-09 2015-05-08     1              NA
2: 2015-05-09 2016-05-08     1               1
3: 2016-05-11 2017-05-10     1               3
4: 2017-05-11 2018-05-10     1               1
5: 2016-08-29 2017-08-28     2              NA
6: 2017-08-29 2018-08-28     2               1

data

df1 <- structure(list(Start_Date = c("2014-05-09", "2015-05-09", "2016-05-11", 
"2017-05-11", "2016-08-29", "2017-08-29"), End_Date = c("2015-05-08", 
"2016-05-08", "2017-05-10", "2018-05-10", "2017-08-28", "2018-08-28"
), ID = c(1L, 1L, 1L, 1L, 2L, 2L)), class = "data.frame", 
row.names = c(NA, 
-6L))

CodePudding user response：

Another data.table option

setDT(df)[
  ,
  c(lapply(.SD, as.IDate), .(ID = ID)),
  .SDcols = patterns("Date$")
][
  ,
  DayspDiff := Start_Date - shift(End_Date),
  ID
][]

yields

   Start_Date   End_Date ID DayspDiff
1: 2014-05-09 2015-05-08  1        NA
2: 2015-05-09 2016-05-08  1         1
3: 2016-05-11 2017-05-10  1         3
4: 2017-05-11 2018-05-10  1         1
5: 2016-08-29 2017-08-28  2        NA
6: 2017-08-29 2018-08-28  2         1