Home > Back-end >  r create a unique numeric value for every id in column
r create a unique numeric value for every id in column

Time:02-10

I have a dataset with a long list of random IDs like this.

 ID      
 H001  
 H00A  
 H00M  
 B00A  
 BB0B  
 AB0A  
 AA0B  
 AA0B  
 BB0B   
 H001  
 H00A  
 H001  
 H00M  
 H00Z  
 CC01  
 CD01  
 CC02  
 XT01  
 XT0A  
 XT0A  

I like to create a new column with numeric values for each ID. The final dataset would appear like this.

 ID      NumId
 H001    1
 H00A    2 
 H00M    3
 B00A    4
 BB0B    5
 AB0A    6
 AA0B    7
 AA0B    7
 BB0B    5
 H001    1
 H00A    2 
 H001    1 
 H00M    3
 H00Z    8
 CC01    9
 CD01    10
 CC02    11
 XT01    12
 XT0A    13
 XT0A    13

Any suggestions on how to create a numerically equivalent column is much appreciated thanks.

CodePudding user response:

By using the fact that factors are internally numeric this is quite easy:

a<-c('a','b','c','a','b','e')
as.numeric(as.factor(a))
#> [1] 1 2 3 1 2 4

Created on 2022-02-10 by the reprex package (v2.0.1)

CodePudding user response:

If you want to keep your original ordering

tmp=df$ID[!duplicated(df$ID)]
match(df$ID,tmp)

 [1]  1  2  3  4  5  6  7  7  5  1  2  1  3  8  9 10 11 12 13 13

CodePudding user response:

We can use tidyverse to do it, but for my solution, the order of NumID is not matching the order of ID.

library(tidyverse)

df %>% 
  group_by(ID) %>% 
  mutate(NumID = cur_group_id())

Output

# A tibble: 20 x 2
# Groups:   ID [13]
   ID    NumID
   <chr> <int>
 1 H001      8
 2 H00A      9
 3 H00M     10
 4 B00A      3
 5 BB0B      4
 6 AB0A      2
 7 AA0B      1
 8 AA0B      1
 9 BB0B      4
10 H001      8
11 H00A      9
12 H001      8
13 H00M     10
14 H00Z     11
15 CC01      5
16 CD01      7
17 CC02      6
18 XT01     12
19 XT0A     13
20 XT0A     13

UPDATED: this now have correct order

df %>% mutate(NumID = as.numeric(factor(ID, levels = unique(ID))))

     ID NumID
1  H001     1
2  H00A     2
3  H00M     3
4  B00A     4
5  BB0B     5
6  AB0A     6
7  AA0B     7
8  AA0B     7
9  BB0B     5
10 H001     1
11 H00A     2
12 H001     1
13 H00M     3
14 H00Z     8
15 CC01     9
16 CD01    10
17 CC02    11
18 XT01    12
19 XT0A    13
20 XT0A    13
  • Related