I have a data.frame
of linear intervals, where each interval also has a numeric index:
df <- data.frame(id = c("id1","id20","id7","id12","id15"),
start = c(36, 41, 216, 234, 300),
end = c(21, 112, 263, 269, 340),
index = c(11, 12, 28, 29, 33))
Where df
is sorted by index
in ascending order.
I want to merge each set of rows whose indices are consecutive into a single row, such that their id
is concatenated by a ;
, start
is the minimum start
of the set, end
is the maximum end
of the set, and index
is also the maximum index
of the set.
So for the example above the resulting merged data.frame
will be:
merged.df <- data.frame(id = c("id1;id20","id7;id12","id15"),
start = c(36, 216, 300),
end = c(112, 269, 340),
index = c(12, 29, 33))
Any idea?
CodePudding user response:
You could use cumsum(c(TRUE, diff(index) != 1))
to identify where the indices are consecutive.
library(dplyr)
df %>%
group_by(grp = cumsum(c(TRUE, diff(index) != 1))) %>%
summarise(id = paste(id, collapse = ";"),
start = min(start), end = max(end),
index = last(index)) %>%
select(-grp)
# # A tibble: 3 × 4
# id start end index
# <chr> <dbl> <dbl> <dbl>
# 1 id1;id20 36 112 12
# 2 id7;id12 216 269 29
# 3 id15 300 340 33