I have a string str
from which multiple substrings are to be extracted.
str <- "Nucleotide transport and metabolism,Secondary metabolites biosynthesis, transport, and catabolism / Chromatin structure and dynamics,Coenzyme metabolism,"
The conditions for extraction are:
- Extract everything till the first occurrence of a
,
only if the next character is a capital letter - If the character next to a
,
is not a capital letter, then proceed till- the next occurrence of
,
which is followed by a capital letter OR - the occurrence of
/
OR - the end of string
- the next occurrence of
The output should look like this
>output
[1] "Nucleotide transport and metabolism" "Secondary metabolites biosynthesis, transport, and catabolism"
[3] "Chromatin structure and dynamics" "Coenzyme metabolism"
CodePudding user response:
You can use str_split
from the stringr
package.
library(stringr)
str_split(str, ",(?=[:upper:])|\\s\\/\\s") %>% unlist() %>% gsub(",$", "", .)
[1] "Nucleotide transport and metabolism"
[2] "Secondary metabolites biosynthesis, transport, and catabolism"
[3] "Chromatin structure and dynamics"
[4] "Coenzyme metabolism,"