Home > Net >  Recoding existing column in R
Recoding existing column in R

Time:03-12

I have dataframe containing following two columns

      Tumor_Barcode    SEX
     MEL-JWCI-WGS-1   Male
     MEL-JWCI-WGS-11   Male
     MEL-JWCI-WGS-12 Female
     MEL-JWCI-WGS-13   Male
    

I want to recode column Tumor_Barcode into third column Sample_ID and output should be as following.

     Tumor_Barcode   Sex   Sample_ID
     MEL-JWCI-WGS-1   Male  ME001
     MEL-JWCI-WGS-11   Male ME011
     MEL-JWCI-WGS-12 Female ME012
     MEL-JWCI-WGS-13   Male ME013

Is there anyway i can do it in R?

Data:

Tumor_Barcode<-c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex<-c("Male", "Male", "Female", "Male")
DF1<-data.frame(Tumor_Barcode,Sex)

CodePudding user response:

Here is a base R way.

Tumor_Barcode <- c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex <- c("Male", "Male", "Female", "Male")
DF1 <- data.frame(Tumor_Barcode,Sex)

num <- as.integer(sub("[^[:digit:]] ", "", DF1$Tumor_Barcode))
DF1$Sample_ID <- sprintf("MEd", num)
rm(num)    # tidy up
DF1
#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

Created on 2022-03-11 by the reprex package (v2.0.1)

The two code lines that create the new column can become a one-liner:

DF1$Sample_ID <- sprintf("MEd", as.integer(sub("[^[:digit:]] ", "", DF1$Tumor_Barcode)))
DF1
#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013

Created on 2022-03-11 by the reprex package (v2.0.1)

CodePudding user response:

We may use base R

DF1$Sample_ID <- with(DF1, sprintf('%sd', 
   substr(trimws(Tumor_Barcode), 1, 2), 
      as.integer(trimws(Tumor_Barcode, whitespace = "\\D "))))

-output

> DF1
    Tumor_Barcode    Sex Sample_ID
1  MEL-JWCI-WGS-1   Male     ME001
2 MEL-JWCI-WGS-11   Male     ME011
3 MEL-JWCI-WGS-12 Female     ME012
4 MEL-JWCI-WGS-13   Male     ME013

CodePudding user response:

A possible solution:

library(tidyverse)

DF1 %>% 
  mutate(Sample_ID = str_c("ME", str_extract(Tumor_Barcode, "\\d $") %>% 
         str_pad(3, pad = "0")))

#>     Tumor_Barcode    Sex Sample_ID
#> 1  MEL-JWCI-WGS-1   Male     ME001
#> 2 MEL-JWCI-WGS-11   Male     ME011
#> 3 MEL-JWCI-WGS-12 Female     ME012
#> 4 MEL-JWCI-WGS-13   Male     ME013
  • Related