For example, in the column I have, there is a line written 'Ser25Phe'
. And I want to split the column HGVS.Consequence
e.g. as 'Ser 25 Phe'
...
HGVS.Consequence
Met1?
Met1?
Met1?
Ala2Glu
Ala2Ala
Asn3Asp
Asn3Asn
Gly4Trp
Gly4Arg
Ala6Glu
AsAsp
Arg9Arg
Lys10Arg
Lys10Lys
LeullLeu
Phe12Ser
Phe12Cys
lle13Leu
lle13Val
lle13Phe
Thr15Pro
CodePudding user response:
Another solution :
x <- c("Ala2Ala", "Asn3Asp", "Ser25Phe")
stringr::str_split(sub("(\\d )", ";\\1;", x), ";", simplify = T)
A different form of the solution :
sub("(\\d )", " \\1 ", x)
CodePudding user response:
Using gsub
, assuming that e.g. "AsAsp"
should also be split into "As Asp"
.
trimws(gsub('([A-Z]?[a-z] )(\\d )?([A-Z?]*)', '\\1 \\2 \\3', x)) |>
gsub(pat=' ', rep=' ') ## optional, to remove inner double whitespace
# [1] "Met 1 ?" "Met 1 ?" "Met 1 ?" "Ala 2 Glu"
# [5] "Ala 2 Ala" "Asn 3 Asp" "Asn 3 Asn" "Gly 4 Trp"
# [9] "Gly 4 Arg" "Ala 6 Glu" "As Asp" "Arg 9 Arg"
# [13] "Lys 10 Arg" "Lys 10 Lys" "Leull Leu" "Phe 12 Ser"
# [17] "Phe 12 Cys" "lle 13 Leu" "lle 13 Val" "lle 13 Phe"
# [21] "Thr 15 Pro"
See demo.
Edit
If your column is in a data frame like this
df <- data.frame(x1=rnorm(21), x2=runif(21), x3=x)
just wrap it in a transform
:
df |>
transform(x3=trimws(gsub('([A-Z]?[a-z] )(\\d )?([A-Z?]*)', '\\1 \\2 \\3', x)) |>
gsub(pat=' ', rep=' '))
# x1 x2 x3
# 1 1.33312448 0.83441710 Met 1 ?
# 2 -0.55792615 0.48805921 Met 1 ?
# 3 1.38184166 0.73862824 Met 1 ?
# 4 -0.87990439 0.42793122 Ala 2 Glu
# 5 0.59143575 0.23370509 Ala 2 Ala
# 6 -0.15065801 0.92168932 Asn 3 Asp
# 7 -1.59350802 0.58727950 Asn 3 Asn
# 8 -0.21971055 0.69603185 Gly 4 Trp
# 9 -0.14004599 0.36722717 Gly 4 Arg
# 10 0.31747188 0.54845522 Ala 6 Glu
# 11 -0.07593689 0.41273905 As Asp
# 12 -0.54154181 0.12890089 Arg 9 Arg
# 13 1.09159765 0.19433579 Lys 10 Arg
# 14 -0.71238122 0.28212593 Lys 10 Lys
# 15 -0.68086189 0.89415476 Leull Leu
# 16 -0.05169070 0.48129061 Phe 12 Ser
# 17 -0.21871795 0.06282263 Phe 12 Cys
# 18 -1.42723032 0.62185980 lle 13 Leu
# 19 0.93924955 0.39333277 lle 13 Val
# 20 0.71006152 0.22982191 lle 13 Phe
# 21 -0.66542079 0.66382062 Thr 15 Pro
Data:
x <- c("Met1?", "Met1?", "Met1?", "Ala2Glu", "Ala2Ala", "Asn3Asp",
"Asn3Asn", "Gly4Trp", "Gly4Arg", "Ala6Glu", "AsAsp", "Arg9Arg",
"Lys10Arg", "Lys10Lys", "LeullLeu", "Phe12Ser", "Phe12Cys", "lle13Leu",
"lle13Val", "lle13Phe", "Thr15Pro")