Home > Back-end >  How can I separate 3 different information in a column?
How can I separate 3 different information in a column?

Time:02-12

For example, in the column I have, there is a line written 'Ser25Phe'. And I want to split the column HGVS.Consequence e.g. as 'Ser 25 Phe'...

HGVS.Consequence
           Met1?
           Met1?
           Met1?
         Ala2Glu
         Ala2Ala
         Asn3Asp
         Asn3Asn
         Gly4Trp
         Gly4Arg
         Ala6Glu
           AsAsp
         Arg9Arg
        Lys10Arg
        Lys10Lys
        LeullLeu
        Phe12Ser
        Phe12Cys
        lle13Leu
        lle13Val
        lle13Phe
        Thr15Pro

CodePudding user response:

Another solution :

x <- c("Ala2Ala", "Asn3Asp", "Ser25Phe")
stringr::str_split(sub("(\\d )", ";\\1;", x), ";", simplify = T) 

A different form of the solution :

sub("(\\d )", " \\1 ", x)

CodePudding user response:

Using gsub, assuming that e.g. "AsAsp" should also be split into "As Asp".

trimws(gsub('([A-Z]?[a-z] )(\\d )?([A-Z?]*)', '\\1 \\2 \\3', x)) |> 
  gsub(pat='  ', rep=' ')  ## optional, to remove inner double whitespace
# [1] "Met 1 ?"    "Met 1 ?"    "Met 1 ?"    "Ala 2 Glu" 
# [5] "Ala 2 Ala"  "Asn 3 Asp"  "Asn 3 Asn"  "Gly 4 Trp" 
# [9] "Gly 4 Arg"  "Ala 6 Glu"  "As Asp"     "Arg 9 Arg" 
# [13] "Lys 10 Arg" "Lys 10 Lys" "Leull Leu"  "Phe 12 Ser"
# [17] "Phe 12 Cys" "lle 13 Leu" "lle 13 Val" "lle 13 Phe"
# [21] "Thr 15 Pro"

See demo.

Edit

If your column is in a data frame like this

df <- data.frame(x1=rnorm(21), x2=runif(21), x3=x)

just wrap it in a transform:

df |>
  transform(x3=trimws(gsub('([A-Z]?[a-z] )(\\d )?([A-Z?]*)', '\\1 \\2 \\3', x)) |> 
  gsub(pat='  ', rep=' '))
#             x1         x2         x3
# 1   1.33312448 0.83441710    Met 1 ?
# 2  -0.55792615 0.48805921    Met 1 ?
# 3   1.38184166 0.73862824    Met 1 ?
# 4  -0.87990439 0.42793122  Ala 2 Glu
# 5   0.59143575 0.23370509  Ala 2 Ala
# 6  -0.15065801 0.92168932  Asn 3 Asp
# 7  -1.59350802 0.58727950  Asn 3 Asn
# 8  -0.21971055 0.69603185  Gly 4 Trp
# 9  -0.14004599 0.36722717  Gly 4 Arg
# 10  0.31747188 0.54845522  Ala 6 Glu
# 11 -0.07593689 0.41273905     As Asp
# 12 -0.54154181 0.12890089  Arg 9 Arg
# 13  1.09159765 0.19433579 Lys 10 Arg
# 14 -0.71238122 0.28212593 Lys 10 Lys
# 15 -0.68086189 0.89415476  Leull Leu
# 16 -0.05169070 0.48129061 Phe 12 Ser
# 17 -0.21871795 0.06282263 Phe 12 Cys
# 18 -1.42723032 0.62185980 lle 13 Leu
# 19  0.93924955 0.39333277 lle 13 Val
# 20  0.71006152 0.22982191 lle 13 Phe
# 21 -0.66542079 0.66382062 Thr 15 Pro

Data:

x <- c("Met1?", "Met1?", "Met1?", "Ala2Glu", "Ala2Ala", "Asn3Asp", 
"Asn3Asn", "Gly4Trp", "Gly4Arg", "Ala6Glu", "AsAsp", "Arg9Arg", 
"Lys10Arg", "Lys10Lys", "LeullLeu", "Phe12Ser", "Phe12Cys", "lle13Leu", 
"lle13Val", "lle13Phe", "Thr15Pro")
  • Related