Seems simple but I can't make it work -
I have:
dtIhave = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1))
I want:
dtIwant = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1),
col2 = c(1,2,3,4,5,6,1,2,3,4,5,6,7,1))
I am thinking create a column to sum then create groups then sum by these groups but I cannot make it work:
dtItry = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1),
tosum = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1),
grp = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3),
col2 = c(1,2,3,4,5,6,1,7,8,9,10,11,12,1))
so I create grp using col1, then group by grp and sum tosum. But I can't make the code work because the col2 in the end just keeps accumulating.
CodePudding user response:
You can do it one step without grouping the data:
library(data.table)
dtIhave[, col2 := with(rle(cumsum(col1)), sequence(lengths))]
or
dtIhave[, col2 := sequence(diff(which(c(col1, 1) == 1)))]
Which gives:
col1 col2
1: 1 1
2: 0 2
3: 0 3
4: 0 4
5: 0 5
6: 0 6
7: 1 1
8: 0 2
9: 0 3
10: 0 4
11: 0 5
12: 0 6
13: 0 7
14: 1 1
CodePudding user response:
Maybe this
id <- which(dtIhave == 1)
s <- unlist(sapply(1:(length(id)-1), function(x) seq(1, id[x 1] - id[x])))
dtIwant <- data.table(col1 = dtIhave$col1, col2 = c(s,1))
CodePudding user response:
I have figured it out quite simply:
dtIhave = data.table(col1 = c(1,0,0,0,0,0,1,0,0,0,0,0,0,1))
dtIhave[, grp := cumsum(col1)]
dtIhave[, tosum := 1]
dtIhave[, col2 := cumsum(tosum), grp]
> dtIhave
col1 grp tosum col2
1: 1 1 1 1
2: 0 1 1 2
3: 0 1 1 3
4: 0 1 1 4
5: 0 1 1 5
6: 0 1 1 6
7: 1 2 1 1
8: 0 2 1 2
9: 0 2 1 3
10: 0 2 1 4
11: 0 2 1 5
12: 0 2 1 6
13: 0 2 1 7
14: 1 3 1 1