Home > Software engineering >  R data.table group by continuous values
R data.table group by continuous values

Time:11-16

I need some help with grouping data by continuous values.

If I have this data.table

dt <- data.table::data.table( a = c(1,1,1,2,2,2,2,1,1,2), b = seq(1:10), c = seq(1:10) 1 )
 
    a  b  c
 1: 1  1  2
 2: 1  2  3
 3: 1  3  4
 4: 2  4  5
 5: 2  5  6
 6: 2  6  7
 7: 2  7  8
 8: 1  8  9
 9: 1  9 10
10: 2 10 11

I need a group for every following equal values in column a. Of this group i need the first (also min possible) value of column b and the last (also max possible) value of column c.

Like this:

   a  b  c
1: 1  1  4
2: 2  4  8
3: 1  8 10
4: 2 10 11

Thank you very much for your help. I do not get it solved alone.

CodePudding user response:

Probably we can try

> dt[, .(a = a[1], b = b[1], c = c[.N]), rleid(a)][, -1]
   a  b  c
1: 1  1  4
2: 2  4  8
3: 1  8 10
4: 2 10 11

CodePudding user response:

An option with dplyr

library(dplyr)
dt %>% 
  group_by(grp = cumsum(c(TRUE, diff(a) != 0))) %>%
    summarise(across(a:b, first), c = last(c)) %>%
  select(-grp)

-output

# A tibble: 4 × 3
      a     b     c
  <dbl> <int> <dbl>
1     1     1     4
2     2     4     8
3     1     8    10
4     2    10    11
  • Related