Home > database >  Grouping non-breaking date ranges in R
Grouping non-breaking date ranges in R

Time:08-19

I have that kind of data here is dput

df <- structure(list(start_date = structure(c(18312, 18313, 18314, 
18316, 18317, 18318, 18319, 18320, 18321), class = "Date")), row.names = c(NA, 
-9L), class = "data.frame")

looking so;

  start_date
  <date>    
1 2020-02-20
2 2020-02-21
3 2020-02-22
4 2020-02-24
5 2020-02-25
6 2020-02-26
7 2020-02-27
8 2020-02-28
9 2020-02-29

I want to create a summary table which covers a start and end date for each non-breaking date range. The desired output is looking so ;

  start      end       
  <date>     <date>    
1 2020-02-20 2020-02-22
2 2020-02-24 2020-02-29

data.table solutions are also welcome.

Thanks in advance.

CodePudding user response:

You could use cumsum(c(1, diff(start_date) != 1)) to find consecutive dates.

library(dplyr)

df %>%
  group_by(id = cumsum(c(1, diff(start_date) != 1))) %>%
  summarise(start = first(start_date), end = last(start_date))

# # A tibble: 2 × 3
#      id start      end       
#   <dbl> <date>     <date>    
# 1     1 2020-02-20 2020-02-22
# 2     2 2020-02-24 2020-02-29

CodePudding user response:

Group the dates by seqid(start_date) using the collapse package.

library(collapse)
library(dplyr)

df %>%
  group_by(group = seqid(start_date)) %>%
  summarize(start = min(start_date), end = max(start_date), .groups = "drop")
## # A tibble: 2 × 3
##   group start      end       
##   <qG>  <date>     <date>    
## 1 1     2020-02-20 2020-02-22
## 2 2     2020-02-24 2020-02-29
  • Related