Home > OS >  Re-populate a sequence of continuous ID variable by another ID variable in R
Re-populate a sequence of continuous ID variable by another ID variable in R

Time:06-22

Apologies for the vagueness of the title.

Hopefully the following example will make more sense.

have = data.frame(id = c("A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2"),
                  id2 = c(1,1,2,2,3,3,1,1,2,2,3,3))
want = data.frame(id = c("A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2"),
                  id2 = c(1,1,2,2,3,3,4,4,5,5,6,6))

How can I go from have to want by using dplyr package? I have over a million # of obs in my df in this structure.

Any help would be much appreciated! Thanks heaps!

CodePudding user response:

You can use rleid from data.table:

library(dplyr)
library(data.table)

have %>% 
  mutate(id2 = rleid(id2))

Output

   id id2
1  A1   1
2  A1   1
3  A1   2
4  A1   2
5  A1   3
6  A1   3
7  A2   4
8  A2   4
9  A2   5
10 A2   5
11 A2   6
12 A2   6

Or just with data.table

library(data.table)

setDT(have)[, id2 := rleid(id2)]

Or with rle from base R:

have$id2 <- with(rle(have$id2), rep(seq_along(values), lengths))

CodePudding user response:

You can use cumsum with a logical expression (lead(id2) == id2) to populate the sequence. This will add 1 to the id2 column when id2 column changes, so that the result would be a consecutive sequence.

library(dplyr)

want2 <- have %>% 
  mutate(id2 = cumsum(lead(id2) == id2)) %>% 
  fill(id2, .direction = "down")

want2
   id id2
1  A1   1
2  A1   1
3  A1   2
4  A1   2
5  A1   3
6  A1   3
7  A2   4
8  A2   4
9  A2   5
10 A2   5
11 A2   6
12 A2   6
  • Related