I would like to find out the difference in days between the two date field based on the ID number. And the last date found based on ID would be 0 (Date - itself) Example a tibble dataframe:
ID | Date | Difference |
---|---|---|
1 | 1/1/2022 | 2 |
1 | 3/1/2022 | 0 |
2 | 2/1/2022 | 3 |
2 | 5/1/2022 | 23 |
2 | 28/1/2022 | 0 |
Is there any way to achieve this?
CodePudding user response:
You could use diff()
to calculate lagged differences.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Date = as.Date(Date, '%d/%m/%Y'),
Difference = c(diff(Date), 0)) %>%
ungroup()
# # A tibble: 5 × 3
# ID Date Difference
# <int> <date> <drtn>
# 1 1 2022-01-01 2 days
# 2 1 2022-01-03 0 days
# 3 2 2022-01-02 3 days
# 4 2 2022-01-05 23 days
# 5 2 2022-01-28 0 days
CodePudding user response:
Here is a lubridate approach using durations:
library(tidyverse)
library(lubridate)
df %>%
mutate(Date = dmy(Date)) %>%
group_by(ID) %>%
mutate(Diff = (Date %--% lead(Date)) / ddays(1),
Diff = replace_na(Diff, 0)) %>%
ungroup()
# # A tibble: 5 × 4
# ID Date Difference Diff
# <chr> <date> <chr> <dbl>
# 1 1 2022-01-01 2 2
# 2 1 2022-01-03 0 0
# 3 2 2022-01-02 3 3
# 4 2 2022-01-05 23 23
# 5 2 2022-01-28 0 0