Home > Enterprise >  Isolating a particular section of one string in R that is between other characters?
Isolating a particular section of one string in R that is between other characters?

Time:07-13

I am attempting to subset the portion of the string that is between the ( and the : ... ,(.

str <- "((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,(Gs1:1,(Gs2:2,(Hs1:3,Gs3:4):5):6):7);"

I want to separate the two strings at the ":7,"; however the 7 is an unknown number.

So it would look like this.

((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,

Suggestions? I have tried the examples using grep, but that only works for a string vector.

CodePudding user response:

Your str (avoid that name because it's also a function name in R) is actually a string/character vector, so you could use regex with the grep-family functions. However, I prefer to use the stringr-library for tasks like this.

New answer to the updated question:

library(stringr)

text2 <- "((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,(Gs1:1,(Gs2:2,(Hs1:3,Gs3:4):5):6):7);"

str_extract(text2, ".*(\\):\\d ,)")

Output:

[1] "((Gs4:1,(Hs4:2,(Hs3:3,Hs2:4:5):6):7,"

And if your digits are decimals, you would do (using original data):

str_extract(text, ".*(\\):\\d \\.\\d ,)")

Output:

"((Gs4:1.73291661357393,(Hs4:0.993223918833335,(Hs3:0.462662063446464,Hs2:0.462662063446464):0.530561855386871):0.739692694740595):0.77385263642607,"

Original answer:

You could do:

text <- "((Gs4:1.73291661357393,(Hs4:0.993223918833335,(Hs3:0.462662063446464,Hs2:0.462662063446464):0.530561855386871):0.739692694740595):0.77385263642607,(Gs1:1.04501955683528,(Gs2:0.614952860455402,(Hs1:0.543437073198918,Gs3:0.543437073198918):0.071515787256484):0.430066696379874):1.46174969316472);"

library(stringr)

str_extract_all(text, "[:upper:][:lower:]\\d:")

Output:

[[1]]
[1] "Gs4:" "Hs4:" "Hs3:" "Hs2:" "Gs1:" "Gs2:" "Hs1:" "Gs3:"
  • Related