Home > front end >  New ID column depending on another column in R
New ID column depending on another column in R

Time:06-23

I want to generate a new ID column in my df based on another column my df looks something like this

> TCR <- c("CAAETSGSRLTF;CASSQEGTGVYEQYF","CGSRLTF;CASSQEGTGVYEQYF","CAAETSGSRLTF;CASSQEGT", "CAAETSGSRLTF;CASSQEGTGVYEQYF")
> df <- as.data.frame(TCR)
> df
    cdr3
1 CAAETSGSRLTF;CASSQEGTGVYEQYF
2      CGSRLTF;CASSQEGTGVYEQYF
3 CAAETSGSRLTF;CASSQEGT
4 CAAETSGSRLTF;CASSQEGTGVYEQYF

I want to add a new column df$ID that looks into df$cdr3 and assigns a new character for each value, and if the value is repeated it uses the same value that was used before So it becomes something like this

>df 
    cdr3                           ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1 
2      CGSRLTF;CASSQEGTGVYEQYF     X2
3 CAAETSGSRLTF;CASSQEGT            X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1

Thanks a lot guys

CodePudding user response:

We can use match in base R to match the unique values in 'cdr3', get the index and paste with X

df$ID <- paste0("X", match(df$cdr3, unique(df$cdr3)))

-output

> df
                          cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2      CGSRLTF;CASSQEGTGVYEQYF X2
3        CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1

CodePudding user response:

Here is tidyverse solution with using fct_inorder from forcats package. With fct_inorder we could keep ther order in row_number()!

library(tidyverse)

tibble(cdr3) %>% 
  mutate(cdr3 = fct_inorder(cdr3, row_number())) %>% 
  mutate(ID = paste0("X", as.numeric(factor(cdr3))))
  cdr3                         ID   
  <ord>                        <chr>
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1   
2 CGSRLTF;CASSQEGTGVYEQYF      X2   
3 CAAETSGSRLTF;CASSQEGT        X3   
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1   
Warning messages:
1: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
  first element will be used 
2: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
  first element will be used 
  • Related