I have created a dataframe with a group column and an individual identifier which incorporates the group name and a number formatted to a standardised three digit code:
library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))
All well and good, but how would I go about resetting the individual identifier each time there is a new prefix, for this ideal result:
df2 <- data.frame(group, indiv = c("A001", "A002", "A003",
"B001", "B002", "B003",
"C001", "C002", "C003"))
CodePudding user response:
We may group by 'group', use substr
to extract the first character from 'indiv' and use sprintf
to format the sequence (row_number()
)
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = sprintf('%sd', substr(indiv, 1, 1), row_number())) %>%
ungroup
-output
# A tibble: 9 × 2
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
Or compactly with data.table
library(data.table)
setDT(df)[, indiv := sprintf('%sd', group, rowid(group))]
Or using base R
df$indiv <- with(df, sprintf('%sd', group,
ave(seq_along(group), group, FUN = seq_along)))
CodePudding user response:
Another base R solution:
df <- data.frame(group,
indiv = paste(group, str_pad(rep(1:3, 3),
pad = 0, width = 3 , "left"), sep = ""))
CodePudding user response:
Here is an alternative approach using akrun's sprintf
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = paste0(group, sprintf("d", row_number())))
output:
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
CodePudding user response:
You can use sprintf()
alone inside mutate
:
library(dplyr)
df |>
group_by(group) |>
mutate(indiv = sprintf("%sd", group, 1:n()))
%s
: character strings, in this case group
.
d
: Add 3 leading zeroes to an integer (%d
), in this case the row number in the grouping.