I have some data (irregular group labels) like g
, and I want to obtain k
-- i.e. within-group indices, via resetting cumsum.
g = c(1,1,1, 2, 3,3, 4, 5, 6,6,6,6,6, 7, 8, 9,9,9,9, 10, 11, 12, 13,13)
k = c(1,2,3, 1, 1,2, 1, 1, 1,2,3,4,5, 1, 1, 1,2,3,4, 1, 1, 1, 1, 2)
I have a working solution:
g.index = function(g){
rep.i = c(F,diff(g)==0)
k = numeric(length(g))
for (i in 1:length(g)){
if (rep.i[i]){ cs = cs 1 } else { cs = 1 }
k[i] = cs
}
return(k)
}
But I'm worried it will be slow due to loops versus vectorization. Is there a more efficient way?
CodePudding user response:
As commented by @akrun, use data.table::rowid
g = c(1,1,1, 2, 3,3, 4, 5, 6,6,6,6,6, 7, 8, 9,9,9,9, 10, 11, 12, 13,13)
k = c(1,2,3, 1, 1,2, 1, 1, 1,2,3,4,5, 1, 1, 1,2,3,4, 1, 1, 1, 1, 2)
library(data.table)
all(rowid(g) == k)
#> [1] TRUE