My data are as follows:
data <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2020L,
2020L, 2020L, 2020L, 2020L), group = c("A", "A", "A", "B", "B",
"A", "A", "B", "B", "B"), date = c("2019-07-15", "2019-07-25",
"2019-08-01", "2019-07-15", "2019-07-30", "2020-08-01", "2020-08-03",
"2020-08-01", "2020-08-20", "2020-08-25")),
class = "data.frame", row.names = c(NA,
-10L))
I would like to calculate the number of days passed since the previous date per year per group. When I attempt to do this, the output is not grouping the data. For example, the output will calculate the number of days between group B, 2019-07-15, and group A, 2019-08-21 as a negative value, rather than calculating days passed since each date WITHIN group.
I have tried the following code, which results in the issue I describe above:
library(dplyr)
data %>%
arrange(year, group, date) %>% # To ensure dates are in chronological order within each group
mutate(date = as.Date(date)) %>%
group_by(year, group)%>%
mutate(Difference = difftime(date, lag(date), units = "days"))
Thank you in advance!
CodePudding user response:
Try the following. Small changes to your code make a difference.
suppressPackageStartupMessages(library(dplyr))
data %>%
mutate(date = as.Date(date)) %>%
group_by(group)%>%
arrange(group, date) %>%
mutate(Difference = difftime(date, lag(date), units = "days"))
#> # A tibble: 10 × 4
#> # Groups: group [2]
#> year group date Difference
#> <int> <chr> <date> <drtn>
#> 1 2019 A 2019-07-15 NA days
#> 2 2019 A 2019-07-25 10 days
#> 3 2019 A 2019-08-01 7 days
#> 4 2020 A 2020-08-01 366 days
#> 5 2020 A 2020-08-03 2 days
#> 6 2019 B 2019-07-15 NA days
#> 7 2019 B 2019-07-30 15 days
#> 8 2020 B 2020-08-01 368 days
#> 9 2020 B 2020-08-20 19 days
#> 10 2020 B 2020-08-25 5 days
Created on 2022-06-10 by the reprex package (v2.0.1)
CodePudding user response:
Here is an alternative using interval
function from lubridate
package:
Note:
interval(lag(date,default = date[1]), date) %>%
as.numeric('days')
Code:
library(lubridate)
library(dplyr)
data %>%
mutate(date = ymd(date)) %>%
group_by(group) %>%
mutate(difference = interval(lag(date,default = date[1]), date) %>% as.numeric('days'))
year group date difference
<int> <chr> <date> <dbl>
1 2019 A 2019-07-15 0
2 2019 A 2019-07-25 10
3 2019 A 2019-08-01 7
4 2019 B 2019-07-15 0
5 2019 B 2019-07-30 15
6 2020 A 2020-08-01 366
7 2020 A 2020-08-03 2
8 2020 B 2020-08-01 368
9 2020 B 2020-08-20 19
10 2020 B 2020-08-25 5