I am attempting to subset the portion of the string that is between the ( and the : ... ,(.
str <- "((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,(Gs1:1,(Gs2:2,(Hs1:3,Gs3:4):5):6):7);"
I want to separate the two strings at the ":7,"; however the 7 is an unknown number.
So it would look like this.
((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,
Suggestions? I have tried the examples using grep, but that only works for a string vector.
CodePudding user response:
Your str
(avoid that name because it's also a function name in R
) is actually a string/character vector, so you could use regex
with the grep
-family functions. However, I prefer to use the stringr
-library for tasks like this.
New answer to the updated question:
library(stringr)
text2 <- "((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,(Gs1:1,(Gs2:2,(Hs1:3,Gs3:4):5):6):7);"
str_extract(text2, ".*(\\):\\d ,)")
Output:
[1] "((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,"
And if your digits are decimals, you would do (using original data):
str_extract(text, ".*(\\):\\d \\.\\d ,)")
Output:
"((Gs4:1.73291661357393,(Hs4:0.993223918833335,(Hs3:0.462662063446464,Hs2:0.462662063446464):0.530561855386871):0.739692694740595):0.77385263642607,"
Original answer:
You could do:
text <- "((Gs4:1.73291661357393,(Hs4:0.993223918833335,(Hs3:0.462662063446464,Hs2:0.462662063446464):0.530561855386871):0.739692694740595):0.77385263642607,(Gs1:1.04501955683528,(Gs2:0.614952860455402,(Hs1:0.543437073198918,Gs3:0.543437073198918):0.071515787256484):0.430066696379874):1.46174969316472);"
library(stringr)
str_extract_all(text, "[:upper:][:lower:]\\d:")
Output:
[[1]]
[1] "Gs4:" "Hs4:" "Hs3:" "Hs2:" "Gs1:" "Gs2:" "Hs1:" "Gs3:"