I have a df that contains dates per event on separate rows for several individuals like so (I have 2 rows per ID):
ID | date |
---|---|
1 | 2022-04-26 |
1 | 2021-03-11 |
2 | 2022-01-24 |
2 | 2018-09-12 |
I am looking to calculate the length of time between the dates for each ID - so my output should look like this:
ID | yrsbetweendates |
---|---|
1 | 1 |
2 | 4 |
I think something with dplyr
and mutate
might be the solution but I am not sure
CodePudding user response:
Probably there is some more "simple" and "concise" way to do it but here is one way. It uses functions from dplyr
and lubridate
as well as some base R functions.
library(lubridate)
df <- data.frame(ID = c(1,1,2,2),
date = as.Date(c("2022-04-26", "2021-03-11", "2022-01-24", "2018-09-12")))
df <- df %>%
dplyr::group_by(ID) %>%
dplyr::summarise(yrsbetweendates = ceiling(lubridate::time_length(abs(diff(date)), "years")))
CodePudding user response:
I used time_length and difftime function for calculating years.
But ID:2 years span became 3 years.
library(lubridate)
as.integer(
time_length(
difftime(strptime("2022-04-26", "%Y-%m-%d"), strptime("2021-03-11", "%Y-%m-%d"))
, "years")
)
# 1
as.integer(
time_length(
difftime(strptime("2022-01-24", "%Y-%m-%d"), strptime("2018-09-12", "%Y-%m-%d"))
, "years")
)
# 3
Is this result correct for you?