I would like to create a sequence of numbers within a group but starting from a specific string.
In this example, If the string matches UNP then sequence(seq coloumn) should start from the next row.
ColA | Colb | Seq |
---|---|---|
A | HM | 0 |
A | RES | 0 |
A | UNP | 0 |
A | RES | 1 |
A | RES | 2 |
A | HM | 3 |
B | HM | 0 |
B | RES | 0 |
B | UNP | 0 |
B | RES | 1 |
B | UNP | 2 |
C | UNP | 0 |
Only 1st instance of UNP should be considered not every instance on UNP for each group
CodePudding user response:
You can first create a column specifying the first occurrence of "UNP", then use cumsum()
and lag()
to calculate the Seq
column.
library(dplyr)
df <- read.table(header = T, text = "
ColA Colb Seq
A HM 0
A RES 0
A UNP 0
A RES 1
A RES 2
A HM 3
B HM 0
B RES 0
B UNP 0
B RES 1
B UNP 2
C UNP 0") %>%
select(-Seq)
df %>%
group_by(ColA, Colb) %>%
mutate(seq_count = ifelse(first(Colb) == "UNP" & !duplicated(Colb), 1, 0)) %>%
group_by(ColA) %>%
mutate(Seq = lag(cumsum(cumsum(seq_count)), default = 0), .keep = "unused")
#> # A tibble: 12 × 3
#> # Groups: ColA [3]
#> ColA Colb Seq
#> <chr> <chr> <dbl>
#> 1 A HM 0
#> 2 A RES 0
#> 3 A UNP 0
#> 4 A RES 1
#> 5 A RES 2
#> 6 A HM 3
#> 7 B HM 0
#> 8 B RES 0
#> 9 B UNP 0
#> 10 B RES 1
#> 11 B UNP 2
#> 12 C UNP 0
Created on 2022-03-31 by the reprex package (v2.0.1)