I have a overall start date and end date with break dates, and I am hoping to create multiple entries of times series data showing actual dates worked, which means I'll use the start and finish dates at the beginning and end of the series grouped by ID, and use the break dates in the middle... is there a simpler way of doing this other than using a loop?
Data I have:
ID Start Finish Break_start Break_Finish Break_Number
a 01-01-20 03-05-20 29-04-20 01-05-20 1
b 20-09-19 01-04-22 12-11-19 05-12-19 1
b 20-09-19 01-04-22 05-08-20 25-08-20 2
Data wanted
ID Start_new Finish_new
a 01-01-20 28-04-20
a 01-05-20 03-05-20
b 20-09-19 11-11-19
b 05-12-19 04-08-20
b 25-08-20 01-04-22
Thank you!
CodePudding user response:
With dplyr
, you could summarise the data by ID
to get the starting and finish dates of each duration.
library(dplyr)
df %>%
mutate(across(2:5, as.Date, "%d-%m-%y")) %>%
group_by(ID) %>%
summarise(Start_new = c(first(Start), Break_Finish),
Finish_new = c(Break_start - 1, first(Finish))) %>%
ungroup()
# # A tibble: 5 × 3
# ID Start_new Finish_new
# <chr> <date> <date>
# 1 a 2020-01-01 2020-04-28
# 2 a 2020-05-01 2020-05-03
# 3 b 2019-09-20 2019-11-11
# 4 b 2019-12-05 2020-08-04
# 5 b 2020-08-25 2022-04-01