I have a data.frame with start and end indices (sorted), for example:
df <- data.frame(start.idx = c(1,2,5),
end.idx = c(2,3,6))
I'm looking for a function that will merge rows i
and i-1
if start.idx[i] == end.idx[i-1]
, such that the new row's start.idx
will be start.idx[i-1]
and end.idx
will be end.idx[i]
.
For the example above the resulting new (merged) data.frame will be:
res.df <- data.frame(start.idx = c(1,5),
end.idx = c(3,6))
CodePudding user response:
You may create groups which include rows in the same group if there is a overlap and then select the first
and last
value for start and end respectively.
library(dplyr)
df %>%
arrange(start.idx) %>%
group_by(group = cumsum(start.idx > lag(end.idx, default = 0))) %>%
summarise(start.idx = first(start.idx),
end.idx = last(end.idx)) %>%
select(-group)
# start.idx end.idx
# <dbl> <dbl>
#1 1 3
#2 5 6