Home > Mobile >  Take differences between dates within id over a sequence (different rows) in R
Take differences between dates within id over a sequence (different rows) in R

Time:02-12

I would like - for each id (cpf) - to take the differences, in months, between hire_date and sep_date on the previous row. For example, I would like to take the difference between hire_date associated with Order 1 and sep_date associated with order 2 (and the same for ids with more than two Order values).

Not all observations have just two order values. Some may have significantly more. How can I write a code that accounts for that? In this case there are more than two rows for a id. So I would also need to make more than one difference.

I would always like to take a difference between a given hire_date (e.g. order 2) and the previous sep_Date (order 1) and so forth. For more than two rows: hire_date (order 3) - sep_date (order 2); hire_date (order 2) - sep_date (order 1)...

structure(list(cpf = c(234L, 234L, 245L, 245L, 245L, 555L, 555L
), hire_date = c("10-11-29", "13-7-29", "11-10-19", "13-3-20", 
"13-5-20", "10-02-18", "13-11-21"), sep_date = c("13-4-18", "13-8-29", 
"13-2-15", "13-4-20", NA, "13-10-20", NA), Order = c(1L, 2L, 
1L, 2L, 3L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-7L))

  cpf hire_date sep_date Order
1 234  10-11-29  13-4-18     1
2 234   13-7-29  13-8-29     2
3 245  11-10-19  13-2-15     1
4 245   13-3-20  13-4-20     2
5 245   13-5-20     <NA>     3
6 555  10-02-18 13-10-20     1
7 555  13-11-21     <NA>     2

Any help would be appreciated!

CodePudding user response:

We can convert the date columns to Date class and do a group by difftime

library(dplyr)
library(lubridate)
df1 %>%
  mutate(across(hire_date:sep_date, dmy)) %>% 
  group_by(cpf) %>% 
  mutate(Month = as.numeric(difftime(hire_date, 
     lag(sep_date), unit = "weeks"))/4) %>%
  ungroup
  • Related