I want to extract all substrings that begin with M and are terminated by a *
The string below as an example;
"SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ"
Would ideally return;
MGMTPRLGLESLLE
MTPRLGLESLLE
I have tried the code below;
regmatches(vec, gregexpr('(?<=M).*?(?=\\*)', vec, perl=T))[[1]]
but this drops the first M and only returns the first string rather than all substrings within.
"GMTPRLGLESLLE"
CodePudding user response:
This could be done instead with a for loop on a char array converted from you string.
If you encounter a M you start concatenating chars to a new string until you encounter a *, when you do encounter a * you push the new string to an array of strings and start over from the first step until you reach the end of your loop.
It's not quite as interesting as using REGEX to do it, but it's failsafe.
CodePudding user response:
It is not possible to use regular expressions here, because regular languages don't have memory statesrequired for nested matches.
stringr::str_extract_all("abaca", "a[^a]*a")
only gives you aba but not the sorrounding abaca.
The first M was dropped, because (?<=M)
is a positive look behind which is by definition not part of the match, but just behind it.