Home > Back-end >  Find the difference in Data based on ID in R
Find the difference in Data based on ID in R

Time:08-05

I would like to find out the difference in days between the two date field based on the ID number. And the last date found based on ID would be 0 (Date - itself) Example a tibble dataframe:

ID Date Difference
1 1/1/2022 2
1 3/1/2022 0
2 2/1/2022 3
2 5/1/2022 23
2 28/1/2022 0

Is there any way to achieve this?

CodePudding user response:

You could use diff() to calculate lagged differences.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Date = as.Date(Date, '%d/%m/%Y'),
         Difference = c(diff(Date), 0)) %>%
  ungroup()

# # A tibble: 5 × 3
#      ID Date       Difference
#   <int> <date>     <drtn>
# 1     1 2022-01-01  2 days
# 2     1 2022-01-03  0 days
# 3     2 2022-01-02  3 days
# 4     2 2022-01-05 23 days
# 5     2 2022-01-28  0 days

CodePudding user response:

Here is a lubridate approach using durations:

library(tidyverse)
library(lubridate)

df %>%
  mutate(Date = dmy(Date)) %>%
  group_by(ID) %>%
  mutate(Diff = (Date %--% lead(Date)) / ddays(1),
         Diff = replace_na(Diff, 0)) %>%
  ungroup()

# # A tibble: 5 × 4
#   ID    Date       Difference  Diff
#   <chr> <date>     <chr>      <dbl>
# 1 1     2022-01-01 2              2
# 2 1     2022-01-03 0              0
# 3 2     2022-01-02 3              3
# 4 2     2022-01-05 23            23
# 5 2     2022-01-28 0              0
  • Related