Merge data.frame rows by consecutive index


I have a data.frame of linear intervals, where each interval also has a numeric index:

df <- data.frame(id = c("id1","id20","id7","id12","id15"),
                 start = c(36, 41, 216, 234, 300),
                 end = c(21, 112, 263, 269, 340),
                 index = c(11, 12, 28, 29, 33))

Where df is sorted by index in ascending order.

I want to merge each set of rows whose indices are consecutive into a single row, such that their id is concatenated by a ;, start is the minimum start of the set, end is the maximum end of the set, and index is also the maximum index of the set.

So for the example above the resulting merged data.frame will be:

merged.df <- data.frame(id = c("id1;id20","id7;id12","id15"),
                        start = c(36, 216, 300),
                        end = c(112, 269, 340),
                        index = c(12, 29, 33))

Any idea?

CodePudding user response:

You could use cumsum(c(TRUE, diff(index) != 1)) to identify where the indices are consecutive.


df %>%
  group_by(grp = cumsum(c(TRUE, diff(index) != 1))) %>%
  summarise(id = paste(id, collapse = ";"),
            start = min(start), end = max(end),
            index = last(index)) %>%

# # A tibble: 3 × 4
#   id       start   end index
#   <chr>    <dbl> <dbl> <dbl>
# 1 id1;id20    36   112    12
# 2 id7;id12   216   269    29
# 3 id15       300   340    33
