Home > OS >  Calculate number of days passed since previous date within each group
Calculate number of days passed since previous date within each group

Time:06-11

My data are as follows:

data <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 
2020L, 2020L, 2020L, 2020L), group = c("A", "A", "A", "B", "B", 
"A", "A", "B", "B", "B"), date = c("2019-07-15", "2019-07-25", 
"2019-08-01", "2019-07-15", "2019-07-30", "2020-08-01", "2020-08-03", 
"2020-08-01", "2020-08-20", "2020-08-25")), 
class = "data.frame", row.names = c(NA, 
-10L))

I would like to calculate the number of days passed since the previous date per year per group. When I attempt to do this, the output is not grouping the data. For example, the output will calculate the number of days between group B, 2019-07-15, and group A, 2019-08-21 as a negative value, rather than calculating days passed since each date WITHIN group.

I have tried the following code, which results in the issue I describe above:

library(dplyr)
data %>%
 arrange(year, group, date) %>%  # To ensure dates are in chronological order within each group 
 mutate(date = as.Date(date)) %>%
 group_by(year, group)%>% 
 mutate(Difference = difftime(date, lag(date), units = "days")) 

Thank you in advance!

CodePudding user response:

Try the following. Small changes to your code make a difference.

suppressPackageStartupMessages(library(dplyr))

data %>%
  mutate(date = as.Date(date)) %>%
  group_by(group)%>% 
  arrange(group, date) %>% 
  mutate(Difference = difftime(date, lag(date), units = "days")) 
#> # A tibble: 10 × 4
#> # Groups:   group [2]
#>     year group date       Difference
#>    <int> <chr> <date>     <drtn>    
#>  1  2019 A     2019-07-15  NA days  
#>  2  2019 A     2019-07-25  10 days  
#>  3  2019 A     2019-08-01   7 days  
#>  4  2020 A     2020-08-01 366 days  
#>  5  2020 A     2020-08-03   2 days  
#>  6  2019 B     2019-07-15  NA days  
#>  7  2019 B     2019-07-30  15 days  
#>  8  2020 B     2020-08-01 368 days  
#>  9  2020 B     2020-08-20  19 days  
#> 10  2020 B     2020-08-25   5 days

Created on 2022-06-10 by the reprex package (v2.0.1)

CodePudding user response:

Here is an alternative using interval function from lubridate package:

Note:

interval(lag(date,default = date[1]), date) %>% 
           as.numeric('days')

Code:

library(lubridate)
library(dplyr)
data %>% 
  mutate(date = ymd(date)) %>% 
  group_by(group) %>% 
  mutate(difference = interval(lag(date,default = date[1]), date) %>% as.numeric('days'))  
    year group date       difference
   <int> <chr> <date>          <dbl>
 1  2019 A     2019-07-15          0
 2  2019 A     2019-07-25         10
 3  2019 A     2019-08-01          7
 4  2019 B     2019-07-15          0
 5  2019 B     2019-07-30         15
 6  2020 A     2020-08-01        366
 7  2020 A     2020-08-03          2
 8  2020 B     2020-08-01        368
 9  2020 B     2020-08-20         19
10  2020 B     2020-08-25          5
  •  Tags:  
  • r
  • Related