generate id within group-CodePudding

I have the following dataset

varA <- c(rep("A",2), rep("B",4))
varB <- c(rep("aaaa",2), rep("bbbb", 3), rep("cccc",1) )

dat <- data.frame(varA, varB)
dat

  varA varB
1    A aaaa
2    A aaaa
3    B bbbb
4    B bbbb
5    B bbbb
6    B cccc

I would like to generate ids for each subgroup, such that the first subgroup is 1, the second 2, etc, within varA. Theids can repeat across the dataset, just not within subgroup.

This the needed result

  varA varB res
1    A aaaa   1
2    A aaaa   1
3    B bbbb   1
4    B bbbb   1
5    B bbbb   1
6    B cccc   2

How can I do this with R ?

I tried cur_group_id() in dplyr but it is not working for me...

thanks!!

CodePudding user response：

You can use data.table::rleid(), i.e.

library(dplyr)

df %>% 
 group_by(VarA) %>% 
 mutate(id = data.table::rleid(VarB))

# A tibble: 6 x 3
# Groups:   VarA [2]
#  VarA  VarB     id
#  <chr> <chr> <int>
#1 A     aaaa      1
#2 A     aaaa      1
#3 B     bbbb      1
#4 B     bbbb      1
#5 B     bbbb      1
#6 B     cccc      2

CodePudding user response：

Another potential solution:

library(tidyverse)
varA <- c(rep("A",2), rep("B",4))
varB <- c(rep("aaaa",2), rep("bbbb", 3), rep("cccc",1) )

dat <- data.frame(varA, varB)

dat %>%
  group_by(varA) %>%
  mutate(count = ifelse(varB != lag(varB, default = "NA"),
                       1, 0)) %>%
  mutate(rleid = cumsum(count))
#> # A tibble: 6 × 4
#> # Groups:   varA [2]
#>   varA  varB  count rleid
#>   <chr> <chr> <dbl> <dbl>
#> 1 A     aaaa      1     1
#> 2 A     aaaa      0     1
#> 3 B     bbbb      1     1
#> 4 B     bbbb      0     1
#> 5 B     bbbb      0     1
#> 6 B     cccc      1     2

^{Created on 2021-12-16 by the reprex package (v2.0.1)}