I have that kind of data here is dput
df <- structure(list(start_date = structure(c(18312, 18313, 18314,
18316, 18317, 18318, 18319, 18320, 18321), class = "Date")), row.names = c(NA,
-9L), class = "data.frame")
looking so;
start_date
<date>
1 2020-02-20
2 2020-02-21
3 2020-02-22
4 2020-02-24
5 2020-02-25
6 2020-02-26
7 2020-02-27
8 2020-02-28
9 2020-02-29
I want to create a summary table which covers a start and end date for each non-breaking date range. The desired output is looking so ;
start end
<date> <date>
1 2020-02-20 2020-02-22
2 2020-02-24 2020-02-29
data.table
solutions are also welcome.
Thanks in advance.
CodePudding user response:
You could use cumsum(c(1, diff(start_date) != 1))
to find consecutive dates.
library(dplyr)
df %>%
group_by(id = cumsum(c(1, diff(start_date) != 1))) %>%
summarise(start = first(start_date), end = last(start_date))
# # A tibble: 2 × 3
# id start end
# <dbl> <date> <date>
# 1 1 2020-02-20 2020-02-22
# 2 2 2020-02-24 2020-02-29
CodePudding user response:
Group the dates by seqid(start_date)
using the collapse package.
library(collapse)
library(dplyr)
df %>%
group_by(group = seqid(start_date)) %>%
summarize(start = min(start_date), end = max(start_date), .groups = "drop")
## # A tibble: 2 × 3
## group start end
## <qG> <date> <date>
## 1 1 2020-02-20 2020-02-22
## 2 2 2020-02-24 2020-02-29