I was getting quit far via regex101 but now I am stuck.
I want to extract a string between "markers" using Regex from Python 3.9.
In the following example lines I will get the foobar
back for each line. The "marker" is =
. But that marker does have some edge cases.
lore =foobar= ipsum
(there is space before and after=
)lore =foobar=.
=foobar= ipsum
lore =foobar=
This is what shouldn't not match because the =x
is not allowed.
lore =foobar=x
That is the regex I am using (Python 3.9)
=(.*?)=[ .]
(see a space in the beginning!)
I can handle the characters following after the second marker; allowed is a space or a period.
Number 1 and 2 are working. But 3 and 4 are missing.
The no character or line ending is missing.
Also in the beginning I don't now how to check for no character before =
OR
.
CodePudding user response:
You could write the pattern as:
(?:^| )=(.*?)=(?:[ .]|$)
(?:^| )
Non capture group with an alternation|
matching either a space or assert the start of the string=
Match literally(.*?)
Capture group 1, match any character as least as possible=
Match literallt(?:[ .]|$)
Match either a space or dot, or assert the end of the string
If there can not be any equals sign in between, you might also write the pattern as:
(?<!\S)=([^=\n]*)=(?:[ .]|$)
(?<!\S)
Assert a whitspace boundary to the left=
Match literally([^=\n]*)
Capture group 1, match any character except=
or a newline=
Match literally(?:[ .]|$)
Match either a space or dot, or assert the end of the string