Create running count or cumsum of rows between values-CodePudding

Seems simple but I can't make it work -

I have:

dtIhave = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1))

I want:

dtIwant = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1),
                     col2 = c(1,2,3,4,5,6,1,2,3,4,5,6,7,1))

I am thinking create a column to sum then create groups then sum by these groups but I cannot make it work:

dtItry = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1),
                    tosum = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1),
                    grp = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3),
                    col2 = c(1,2,3,4,5,6,1,7,8,9,10,11,12,1))

so I create grp using col1, then group by grp and sum tosum. But I can't make the code work because the col2 in the end just keeps accumulating.

CodePudding user response：

You can do it one step without grouping the data:

library(data.table)

dtIhave[, col2 := with(rle(cumsum(col1)), sequence(lengths))]

dtIhave[, col2 := sequence(diff(which(c(col1, 1) == 1)))]

Which gives:

    col1 col2
 1:    1    1
 2:    0    2
 3:    0    3
 4:    0    4
 5:    0    5
 6:    0    6
 7:    1    1
 8:    0    2
 9:    0    3
10:    0    4
11:    0    5
12:    0    6
13:    0    7
14:    1    1

CodePudding user response：

Maybe this

id <- which(dtIhave == 1)
s <- unlist(sapply(1:(length(id)-1), function(x) seq(1, id[x 1] - id[x])))
dtIwant <- data.table(col1 = dtIhave$col1, col2 = c(s,1))

CodePudding user response：

I have figured it out quite simply:

dtIhave = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1))

dtIhave[, grp := cumsum(col1)]

dtIhave[, tosum := 1]

dtIhave[, col2 := cumsum(tosum), grp]

> dtIhave
    col1 grp tosum col2
 1:    1   1     1    1
 2:    0   1     1    2
 3:    0   1     1    3
 4:    0   1     1    4
 5:    0   1     1    5
 6:    0   1     1    6
 7:    1   2     1    1
 8:    0   2     1    2
 9:    0   2     1    3
10:    0   2     1    4
11:    0   2     1    5
12:    0   2     1    6
13:    0   2     1    7
14:    1   3     1    1