Home > Enterprise >  How to reset a numerical sequence after a new suffix in a R vector
How to reset a numerical sequence after a new suffix in a R vector

Time:12-25

I have created a dataframe with a group column and an individual identifier which incorporates the group name and a number formatted to a standardised three digit code:

library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))

All well and good, but how would I go about resetting the individual identifier each time there is a new prefix, for this ideal result:

df2 <- data.frame(group, indiv = c("A001", "A002", "A003", 
                                   "B001", "B002", "B003", 
                                   "C001", "C002", "C003"))

CodePudding user response:

We may group by 'group', use substr to extract the first character from 'indiv' and use sprintf to format the sequence (row_number())

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(indiv = sprintf('%sd', substr(indiv, 1, 1), row_number())) %>%
  ungroup

-output

# A tibble: 9 × 2
  group indiv
  <chr> <chr>
1 A     A001 
2 A     A002 
3 A     A003 
4 B     B001 
5 B     B002 
6 B     B003 
7 C     C001 
8 C     C002 
9 C     C003 

Or compactly with data.table

library(data.table)
setDT(df)[, indiv := sprintf('%sd', group, rowid(group))]

Or using base R

df$indiv <-  with(df, sprintf('%sd', group, 
       ave(seq_along(group), group, FUN = seq_along)))

CodePudding user response:

Another base R solution:

df <- data.frame(group, 
            indiv = paste(group, str_pad(rep(1:3, 3), 
                    pad = 0, width = 3 , "left"), sep = ""))

CodePudding user response:

Here is an alternative approach using akrun's sprintf

library(dplyr)

df %>% 
  group_by(group) %>% 
  mutate(indiv = paste0(group, sprintf("d", row_number())))

output:

  group indiv
  <chr> <chr>
1 A     A001 
2 A     A002 
3 A     A003 
4 B     B001 
5 B     B002 
6 B     B003 
7 C     C001 
8 C     C002 
9 C     C003

CodePudding user response:

You can use sprintf() alone inside mutate:

library(dplyr)

df |> 
  group_by(group) |> 
  mutate(indiv = sprintf("%sd", group, 1:n()))

%s: character strings, in this case group.

d: Add 3 leading zeroes to an integer (%d), in this case the row number in the grouping.

  • Related