I would like - for each id (cpf) - to take the differences, in months, between hire_date and sep_date on the previous row. For example, I would like to take the difference between hire_date associated with Order 1 and sep_date associated with order 2 (and the same for ids with more than two Order values).
Not all observations have just two order values. Some may have significantly more. How can I write a code that accounts for that? In this case there are more than two rows for a id. So I would also need to make more than one difference.
I would always like to take a difference between a given hire_date (e.g. order 2) and the previous sep_Date (order 1) and so forth. For more than two rows: hire_date (order 3) - sep_date (order 2); hire_date (order 2) - sep_date (order 1)...
structure(list(cpf = c(234L, 234L, 245L, 245L, 245L, 555L, 555L
), hire_date = c("10-11-29", "13-7-29", "11-10-19", "13-3-20",
"13-5-20", "10-02-18", "13-11-21"), sep_date = c("13-4-18", "13-8-29",
"13-2-15", "13-4-20", NA, "13-10-20", NA), Order = c(1L, 2L,
1L, 2L, 3L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-7L))
cpf hire_date sep_date Order
1 234 10-11-29 13-4-18 1
2 234 13-7-29 13-8-29 2
3 245 11-10-19 13-2-15 1
4 245 13-3-20 13-4-20 2
5 245 13-5-20 <NA> 3
6 555 10-02-18 13-10-20 1
7 555 13-11-21 <NA> 2
Any help would be appreciated!
CodePudding user response:
We can convert the date columns to Date
class and do a group by difftime
library(dplyr)
library(lubridate)
df1 %>%
mutate(across(hire_date:sep_date, dmy)) %>%
group_by(cpf) %>%
mutate(Month = as.numeric(difftime(hire_date,
lag(sep_date), unit = "weeks"))/4) %>%
ungroup