Home > front end >  Regex (python) to match same group several times only when preceded or followed by specific pattern
Regex (python) to match same group several times only when preceded or followed by specific pattern

Time:01-04

Suppose I have the following text:

Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.

I need to match every string within the «» quotes but ONLY in the period starting with the "Products to be destroyed:" pattern or ending with the (Rule) pattern.

In other words in this example I do NOT want to match Dilora nor Apple.

The regex to get the quoted contents in the capturing group is:

«(. ?)»

Is it possible to "anchor" it to either a following pattern (such as Rule) or even to a prior pattern (such as "Products to be destroyed:"?

This is my saved attempt on regex101

Thank you very much.

CodePudding user response:

You can match at least a single part between the arrows, and when there is a match, extract all the parts using re.findall for example.

The example data seems to be within a dot. In that case you can match at least a single arrow part matching any char except a dot using a negated character class.

Regex demo for at least a single match, and another demo to match the separate parts afterwards

import re

regex = r"\bProducts to be destroyed:[^.]*«[^«»]*»[^.]*\."
s = 'Products to be destroyed: «Prabo», «Palox 2000», «Remadon strong» (Rule). The customers «Dilora» and «Apple» has to be notified.'
result = re.search(regex, s)

if result:
    print(re.findall(r"«([^«»]*)»", result.group()))

Output

['Prabo', 'Palox 2000', 'Remadon strong']
  •  Tags:  
  • Related