Home > Software design >  Regex: Match whitespace plus = or = is first char
Regex: Match whitespace plus = or = is first char

Time:08-29

I was getting quit far via regex101 but now I am stuck.

I want to extract a string between "markers" using Regex from Python 3.9. In the following example lines I will get the foobar back for each line. The "marker" is =. But that marker does have some edge cases.

  1. lore =foobar= ipsum (there is space before and after =)
  2. lore =foobar=.
  3. =foobar= ipsum
  4. lore =foobar=

This is what shouldn't not match because the =x is not allowed.

  1. lore =foobar=x

That is the regex I am using (Python 3.9)

=(.*?)=[ .] (see a space in the beginning!)

I can handle the characters following after the second marker; allowed is a space or a period.

Number 1 and 2 are working. But 3 and 4 are missing.

The no character or line ending is missing.

Also in the beginning I don't now how to check for no character before = OR .

CodePudding user response:

You could write the pattern as:

(?:^| )=(.*?)=(?:[ .]|$)
  • (?:^| ) Non capture group with an alternation | matching either a space or assert the start of the string
  • = Match literally
  • (.*?) Capture group 1, match any character as least as possible
  • = Match literallt
  • (?:[ .]|$) Match either a space or dot, or assert the end of the string

Regex demo

If there can not be any equals sign in between, you might also write the pattern as:

(?<!\S)=([^=\n]*)=(?:[ .]|$)
  • (?<!\S) Assert a whitspace boundary to the left
  • = Match literally
  • ([^=\n]*) Capture group 1, match any character except = or a newline
  • = Match literally
  • (?:[ .]|$) Match either a space or dot, or assert the end of the string

Regex demo

  • Related