Home > OS >  How to split time series data with breaks in R?
How to split time series data with breaks in R?

Time:08-17

I have a overall start date and end date with break dates, and I am hoping to create multiple entries of times series data showing actual dates worked, which means I'll use the start and finish dates at the beginning and end of the series grouped by ID, and use the break dates in the middle... is there a simpler way of doing this other than using a loop?

Data I have:

ID     Start       Finish      Break_start     Break_Finish     Break_Number
a      01-01-20    03-05-20    29-04-20        01-05-20         1
b      20-09-19    01-04-22    12-11-19        05-12-19         1
b      20-09-19    01-04-22    05-08-20        25-08-20         2

Data wanted

ID    Start_new       Finish_new
a     01-01-20        28-04-20
a     01-05-20        03-05-20
b     20-09-19        11-11-19
b     05-12-19        04-08-20
b     25-08-20        01-04-22

Thank you!

CodePudding user response:

With dplyr, you could summarise the data by ID to get the starting and finish dates of each duration.

library(dplyr)

df %>%
  mutate(across(2:5, as.Date, "%d-%m-%y")) %>%
  group_by(ID) %>%
  summarise(Start_new = c(first(Start), Break_Finish),
            Finish_new = c(Break_start - 1, first(Finish))) %>%
  ungroup()

# # A tibble: 5 × 3
#   ID    Start_new  Finish_new
#   <chr> <date>     <date>    
# 1 a     2020-01-01 2020-04-28
# 2 a     2020-05-01 2020-05-03
# 3 b     2019-09-20 2019-11-11
# 4 b     2019-12-05 2020-08-04
# 5 b     2020-08-25 2022-04-01
  • Related