I have a dataset with a long list of random IDs like this.
ID
H001
H00A
H00M
B00A
BB0B
AB0A
AA0B
AA0B
BB0B
H001
H00A
H001
H00M
H00Z
CC01
CD01
CC02
XT01
XT0A
XT0A
I like to create a new column with numeric values for each ID. The final dataset would appear like this.
ID NumId
H001 1
H00A 2
H00M 3
B00A 4
BB0B 5
AB0A 6
AA0B 7
AA0B 7
BB0B 5
H001 1
H00A 2
H001 1
H00M 3
H00Z 8
CC01 9
CD01 10
CC02 11
XT01 12
XT0A 13
XT0A 13
Any suggestions on how to create a numerically equivalent column is much appreciated thanks.
CodePudding user response:
By using the fact that factors are internally numeric this is quite easy:
a<-c('a','b','c','a','b','e')
as.numeric(as.factor(a))
#> [1] 1 2 3 1 2 4
Created on 2022-02-10 by the reprex package (v2.0.1)
CodePudding user response:
If you want to keep your original ordering
tmp=df$ID[!duplicated(df$ID)]
match(df$ID,tmp)
[1] 1 2 3 4 5 6 7 7 5 1 2 1 3 8 9 10 11 12 13 13
CodePudding user response:
We can use tidyverse
to do it, but for my solution, the order of NumID
is not matching the order of ID
.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(NumID = cur_group_id())
Output
# A tibble: 20 x 2
# Groups: ID [13]
ID NumID
<chr> <int>
1 H001 8
2 H00A 9
3 H00M 10
4 B00A 3
5 BB0B 4
6 AB0A 2
7 AA0B 1
8 AA0B 1
9 BB0B 4
10 H001 8
11 H00A 9
12 H001 8
13 H00M 10
14 H00Z 11
15 CC01 5
16 CD01 7
17 CC02 6
18 XT01 12
19 XT0A 13
20 XT0A 13
UPDATED: this now have correct order
df %>% mutate(NumID = as.numeric(factor(ID, levels = unique(ID))))
ID NumID
1 H001 1
2 H00A 2
3 H00M 3
4 B00A 4
5 BB0B 5
6 AB0A 6
7 AA0B 7
8 AA0B 7
9 BB0B 5
10 H001 1
11 H00A 2
12 H001 1
13 H00M 3
14 H00Z 8
15 CC01 9
16 CD01 10
17 CC02 11
18 XT01 12
19 XT0A 13
20 XT0A 13