Home > OS >  Recode character IDs into numeric IDs
Recode character IDs into numeric IDs

Time:11-18

I need to modify an id variable values. Here is how a sample data looks like:

df <- data.frame(id = c(11,21,22,"33_AS_A","33_AS_B","33_AS_X", "35_Part1","35_Part2","35_Part4","35_Part7"),
                 Grade= c(3,3,3, 4,4,4,5,5,5,5))

> df
         id Grade
1        11     3
2        21     3
3        22     3
4   33_AS_A     4
5   33_AS_B     4
6   33_AS_X     4
7  35_Part1     5
8  35_Part2     5
9  35_Part4     5
10 35_Part7     5

I need to recode the id as a numeric variable by giving ordered numeric values instead of the text values in order.

Here is my desired output looks like:

> df2
    id Grade
1   11     3
2   21     3
3   22     3
4  331     4
5  332     4
6  333     4
7  351     5
8  352     5
9  353     5
10 354     5

Any ideas?

CodePudding user response:

library(dplyr)
library(stringr)
df %>%
  mutate(
    group = str_extract(id, "[0-9] ")
  ) %>%
  group_by(group) %>%
  mutate(id = as.numeric(paste0(group, if(n() > 1) row_number() else ""))) %>%
  ungroup() %>%
  select(-group)
# # A tibble: 10 × 2
#      id Grade
#   <dbl> <dbl>
# 1    11     3
# 2    21     3
# 3    22     3
# 4   331     4
# 5   332     4
# 6   333     4
# 7   351     5
# 8   352     5
# 9   353     5
#10   354     5

CodePudding user response:

Using base, split into groups based on numbers, if the group length is not 1, then add row number:

x <- sapply(strsplit(df$id, "_"), `[`, 1)

df$ID <- unlist(sapply(split(x, x), function(i) 
  if(length(i) == 1) i else paste0(i, seq(i))))

df
#           id Grade  ID
#  1        11     3  11
#  2        21     3  21
#  3        22     3  22
#  4   33_AS_A     4 331
#  5   33_AS_B     4 332
#  6   33_AS_X     4 333
#  7  35_Part1     5 351
#  8  35_Part2     5 352
#  9  35_Part4     5 353
# 10  35_Part7     5 354
  • Related