Home > Software engineering >  How to make sure optional parts of a pattern occure at least once?
How to make sure optional parts of a pattern occure at least once?

Time:11-27

How to make sure that part of the pattern (keyword in this case) is in the pattern you're looking for, but it can appear in different places. I want to have a match only when it occurs at least once.

Regex:

 \b(([0-9])(xyz)?([-]([0-9])(xyz)?)?)\b

We only want the value if there is a keyword: xyz

Examples:

1. 1xyz-2xyz - it's OK
2. 1-2xyz - it's OK
3. 1xyz - it's OK
4. 1-2 - there should be no match, at least one xyz missing

I tried a positive lookahead and lookbehind but this is not working in this case.

CodePudding user response:

You can make use of a conditional construct:

\b([0-9])(xyz)?(?:-([0-9])(xyz)?)?\b(?(2)|(?(4)|(?!)))

See the regex demo. Details:

  • \b - word boundary
  • ([0-9]) - Group 1: a digit
  • (xyz)? - Group 2: an optional xyz string
  • (?:-([0-9])(xyz)?)? - an optional sequence of a -, a digit (Group 3), xyz optional char sequence
  • \b - word boundary
  • (?(2)|(?(4)|(?!))) - a conditional: if Group 2 (first (xyz)?) matched, it is fine, return the match, if not, check if Group 4 (second (xyz)?) matched, and return the match if yes, else, fail the match.

See the Python demo:

import re
text = "1. 1xyz-2xyz - it's OK\n2. 1-2xyz - it's OK\n3. 1xyz - it's OK\n4. 1-2 - there should be no match"
pattern = r"\b([0-9])(xyz)?(?:-([0-9])(xyz)?)?\b(?(2)|(?(4)|(?!)))"
print( [x.group() for x in re.finditer(pattern, text)] )

Output:

['1xyz-2xyz', '1-2xyz', '1xyz']

CodePudding user response:

Try this: \b(([0-9])?(xyz) ([-]([0-9]) (xyz) )?)\b Replace ? with Basically ?: zero or more and in your case you want to match one or more. Whih is

CodePudding user response:

Indeed you could use a lookahead the following way:

\b\d(?:xyz|(?=-\dxyz))(?:-\d(?:xyz)?)?\b

See this demo at regex101 (Explanation on right side)


The first part matches either an xyz OR (if there is none) the lookahead ensures that the xyz occures in the second optional part. The second part is dependent on the previous condition.

CodePudding user response:

If all you want is to ensure "xyz" appear in the string, and you don't need to validate anything else about the rest of the string then you can use

^.*xyz.*$

It basically matches a string that has xyz followed or preceded by any number of unknown characters .* between the start ^ and the end $ of the string

(working demo here)

If you need to validate that the pattern xyz occurs after a digit and a optional - like in your code then you could enforce it using

^.*(?:-?\d xyz) .*$

It matches a string that has optionally a "-" followed by a digit followed by xyz and anything before or after that

(working demo here)

  • Related