How to create a sequency by group from a specific string in R?-CodePudding

I would like to create a sequence of numbers within a group but starting from a specific string.

In this example, If the string matches UNP then sequence(seq coloumn) should start from the next row.

ColA	Colb	Seq
A	HM	0
A	RES	0
A	UNP	0
A	RES	1
A	RES	2
A	HM	3
B	HM	0
B	RES	0
B	UNP	0
B	RES	1
B	UNP	2
C	UNP	0

Only 1st instance of UNP should be considered not every instance on UNP for each group

CodePudding user response：

You can first create a column specifying the first occurrence of "UNP", then use cumsum() and lag() to calculate the Seq column.

library(dplyr)

df <- read.table(header = T, text = "
ColA    Colb    Seq
A   HM  0
A   RES 0
A   UNP 0
A   RES 1
A   RES 2
A   HM  3
B   HM  0
B   RES 0
B   UNP 0
B   RES 1
B   UNP 2
C   UNP 0") %>% 
  select(-Seq)

df %>% 
  group_by(ColA, Colb) %>% 
  mutate(seq_count = ifelse(first(Colb) == "UNP" & !duplicated(Colb), 1, 0)) %>% 
  group_by(ColA) %>% 
  mutate(Seq = lag(cumsum(cumsum(seq_count)), default = 0), .keep = "unused")
#> # A tibble: 12 × 3
#> # Groups:   ColA [3]
#>    ColA  Colb    Seq
#>    <chr> <chr> <dbl>
#>  1 A     HM        0
#>  2 A     RES       0
#>  3 A     UNP       0
#>  4 A     RES       1
#>  5 A     RES       2
#>  6 A     HM        3
#>  7 B     HM        0
#>  8 B     RES       0
#>  9 B     UNP       0
#> 10 B     RES       1
#> 11 B     UNP       2
#> 12 C     UNP       0

^{Created on 2022-03-31 by the reprex package (v2.0.1)}