I have the following text:
https://stackoverflow.com | https://google.com | first text to match |
https://randomsite.com | https://randomurl2.com | text | https://randomsite.com |
https://randomsite.com | https://randomsite.com |
I'm trying to match the first sequence of the string which is not a url, up until |
. In this example I would like the regex to match:
https://stackoverflow.com | https://google.com | first text to match |
Currently I have this:
/^(.*)[|]\s(\b\w*\b)?\s[|]/gm
However, this only works if the first sequence which is not a url is only a string without spaces. If first text to match
was just first
, then it would match.
The desired result would be to match both cases, with strings without spaces and match strings with spaces.
EDIT:
Sometimes I would also need a greedy match, where the regex would match everything up until text |
.
CodePudding user response:
If you have to match at least a leading url:
\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*) [^\s|][^|\r\n]*\|
Explanation
\A
Start of string[\s\S]*?
Match any character as least as possible\b\K
A word boundary, then forget what is matched so far(?:https?://\S*\h*\|\h*)
Match one or more urls followed by|
between optional spaces[^\s|]
Match a non whitespace char except for a pipe[^|\r\n]*
Optionally match any char except a pipe or a newline, then match the last pipe
If no leading urls is also ok:
\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)*[^\s|][^|\r\n]*\|
Example
$re = '~\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*) [^\s|][^|\r\n]*\|~';
$str = ' https://stackoverflow.com | https://google.com | first text to match |
https://randomsite.com | https://randomurl2.com | text | https://randomsite.com |
https://randomsite.com | https://randomsite.com |';
if(preg_match($re, $str, $matches)) {
echo $matches[0];
}
Output
https://stackoverflow.com | https://google.com | first text to match |
CodePudding user response:
You want to include spaces
/^(.*)[|]\s(\b(\w|\s)*\b)?\s[|]/gm
If you want to allow all sorts of special characters in the text (including new lines), you can try this approach:
\|\s*((?!\s*\w :\/\/)[^|] ?)\s\|
https://regex101.com/r/2OOKky/1
If you want to allow all sorts of special characters in the text (but no new lines), you can try this approach:
(?:^|\|)(?:(?!$)\s) ((?!\s*\w :\/\/)(?:(?!$)[^|]) ?)(?:(?!$)\s)*\|