I want to generate a new ID column in my df based on another column my df looks something like this
> TCR <- c("CAAETSGSRLTF;CASSQEGTGVYEQYF","CGSRLTF;CASSQEGTGVYEQYF","CAAETSGSRLTF;CASSQEGT", "CAAETSGSRLTF;CASSQEGTGVYEQYF")
> df <- as.data.frame(TCR)
> df
cdr3
1 CAAETSGSRLTF;CASSQEGTGVYEQYF
2 CGSRLTF;CASSQEGTGVYEQYF
3 CAAETSGSRLTF;CASSQEGT
4 CAAETSGSRLTF;CASSQEGTGVYEQYF
I want to add a new column df$ID that looks into df$cdr3 and assigns a new character for each value, and if the value is repeated it uses the same value that was used before So it becomes something like this
>df
cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2 CGSRLTF;CASSQEGTGVYEQYF X2
3 CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
Thanks a lot guys
CodePudding user response:
We can use match
in base R
to match the unique
values in 'cdr3', get the index and paste
with X
df$ID <- paste0("X", match(df$cdr3, unique(df$cdr3)))
-output
> df
cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2 CGSRLTF;CASSQEGTGVYEQYF X2
3 CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
CodePudding user response:
Here is tidyverse
solution with using fct_inorder
from forcats
package. With fct_inorder
we could keep ther order in row_number()
!
library(tidyverse)
tibble(cdr3) %>%
mutate(cdr3 = fct_inorder(cdr3, row_number())) %>%
mutate(ID = paste0("X", as.numeric(factor(cdr3))))
cdr3 ID
<ord> <chr>
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2 CGSRLTF;CASSQEGTGVYEQYF X2
3 CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
Warning messages:
1: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
first element will be used
2: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
first element will be used