I need to modify an id
variable values. Here is how a sample data looks like:
df <- data.frame(id = c(11,21,22,"33_AS_A","33_AS_B","33_AS_X", "35_Part1","35_Part2","35_Part4","35_Part7"),
Grade= c(3,3,3, 4,4,4,5,5,5,5))
> df
id Grade
1 11 3
2 21 3
3 22 3
4 33_AS_A 4
5 33_AS_B 4
6 33_AS_X 4
7 35_Part1 5
8 35_Part2 5
9 35_Part4 5
10 35_Part7 5
I need to recode the id
as a numeric variable by giving ordered numeric values instead of the text values in order.
Here is my desired output looks like:
> df2
id Grade
1 11 3
2 21 3
3 22 3
4 331 4
5 332 4
6 333 4
7 351 5
8 352 5
9 353 5
10 354 5
Any ideas?
CodePudding user response:
library(dplyr)
library(stringr)
df %>%
mutate(
group = str_extract(id, "[0-9] ")
) %>%
group_by(group) %>%
mutate(id = as.numeric(paste0(group, if(n() > 1) row_number() else ""))) %>%
ungroup() %>%
select(-group)
# # A tibble: 10 × 2
# id Grade
# <dbl> <dbl>
# 1 11 3
# 2 21 3
# 3 22 3
# 4 331 4
# 5 332 4
# 6 333 4
# 7 351 5
# 8 352 5
# 9 353 5
#10 354 5
CodePudding user response:
Using base, split into groups based on numbers, if the group length is not 1, then add row number:
x <- sapply(strsplit(df$id, "_"), `[`, 1)
df$ID <- unlist(sapply(split(x, x), function(i)
if(length(i) == 1) i else paste0(i, seq(i))))
df
# id Grade ID
# 1 11 3 11
# 2 21 3 21
# 3 22 3 22
# 4 33_AS_A 4 331
# 5 33_AS_B 4 332
# 6 33_AS_X 4 333
# 7 35_Part1 5 351
# 8 35_Part2 5 352
# 9 35_Part4 5 353
# 10 35_Part7 5 354