Apologies for the vagueness of the title.
Hopefully the following example will make more sense.
have = data.frame(id = c("A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2"),
id2 = c(1,1,2,2,3,3,1,1,2,2,3,3))
want = data.frame(id = c("A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2"),
id2 = c(1,1,2,2,3,3,4,4,5,5,6,6))
How can I go from have to want by using dplyr package? I have over a million # of obs in my df in this structure.
Any help would be much appreciated! Thanks heaps!
CodePudding user response:
You can use rleid
from data.table
:
library(dplyr)
library(data.table)
have %>%
mutate(id2 = rleid(id2))
Output
id id2
1 A1 1
2 A1 1
3 A1 2
4 A1 2
5 A1 3
6 A1 3
7 A2 4
8 A2 4
9 A2 5
10 A2 5
11 A2 6
12 A2 6
Or just with data.table
library(data.table)
setDT(have)[, id2 := rleid(id2)]
Or with rle
from base R:
have$id2 <- with(rle(have$id2), rep(seq_along(values), lengths))
CodePudding user response:
You can use cumsum
with a logical expression (lead(id2) == id2
) to populate the sequence. This will add 1 to the id2
column when id2
column changes, so that the result would be a consecutive sequence.
library(dplyr)
want2 <- have %>%
mutate(id2 = cumsum(lead(id2) == id2)) %>%
fill(id2, .direction = "down")
want2
id id2
1 A1 1
2 A1 1
3 A1 2
4 A1 2
5 A1 3
6 A1 3
7 A2 4
8 A2 4
9 A2 5
10 A2 5
11 A2 6
12 A2 6