Home > Back-end >  Group variable by "n" consecutive integers in data.table
Group variable by "n" consecutive integers in data.table

Time:10-13

library(data.table)

DT <- data.table(var = 1:100)

I want to create a second variable, group that groups the values in var by n consecutive integers. So if n is equal to 1, it would return the same column as var. If n=2, it would return me:

  var   group
1:   1   1
2:   2   1
3:   3   2
4:   4   2
5:   5   3
6:   6   3

If n=3, it would return me:

  var   group
1:   1   1
2:   2   1
3:   3   1
4:   4   2
5:   5   2
6:   6   2

and so on. I would like to do this as flexibly as possibly.

Note that there could be repeated values:

  var   group
1:   1   1
2:   1   1
3:   2   1
4:   3   2
5:   3   2
6:   4   2

Here, group corresponds to n=2. Thank you!

CodePudding user response:

I think we can use findInterval for this:

DT <- data.table(var = c(1L, 1:10))

n <- 2
DT[, group := findInterval(var, seq(min(var), max(var)   n, by = n))]
#       var group
#     <int> <int>
#  1:     1     1
#  2:     1     1
#  3:     2     1
#  4:     3     2
#  5:     4     2
#  6:     5     3
#  7:     6     3
#  8:     7     4
#  9:     8     4
# 10:     9     5
# 11:    10     5

n <- 3
DT[, group := findInterval(var, seq(min(var), max(var)   n, by = n))]
#       var group
#     <int> <int>
#  1:     1     1
#  2:     1     1
#  3:     2     1
#  4:     3     1
#  5:     4     2
#  6:     5     2
#  7:     6     2
#  8:     7     3
#  9:     8     3
# 10:     9     3
# 11:    10     4

(The n in the call to seq is so that we always have a little more than we need; if we did just seq(min(.),max(.),by=n), it would be possible the highest values of var would be outside of the sequence. One could also do c(seq(min(.), max(.), by=n), Inf) for the same effect.)

  • Related