Home > Enterprise >  regexp help. I want to match "cc dd" that doesn't start with "aa"
regexp help. I want to match "cc dd" that doesn't start with "aa"

Time:09-01

I want to match cc dd that doesn't start with aa

import re

s = 'bb cc dd ee\naa : bb cc dd ee\n11 cc dd ee'
pp = re.compile(r'(?P<n1>ee)|(?P<n2>^(?!aa\b).*\bcc dd\b)', re.MULTILINE)

def _rep(x):
    print(x.groupdict())
    return [f'<{k}>' for k, v in x.groupdict().items() if v is not None][0]

rr = pp.sub(_rep, s)
print(rr)

Result: Current

# print(x.groupdict())
    {'n1': None, 'n2': 'bb cc dd'}
    {'n1': 'ee', 'n2': None}
    {'n1': 'ee', 'n2': None}
    {'n1': None, 'n2': '11 cc dd'}
    {'n1': 'ee', 'n2': None}

# print(rr)
    <n2> <n1>
    aa : bb cc dd <n1>
    <n2> <n1>

Result: I want ..

# print(x.groupdict())
    {'n1': None, 'n2': 'cc dd'}
    {'n1': 'ee', 'n2': None}
    {'n1': 'ee', 'n2': None}
    {'n1': None, 'n2': 'cc dd'}
    {'n1': 'ee', 'n2': None}

# print(rr)
    bb <n2> <n1>
    aa : bb cc dd <n1>
    11 <n2> <n1>

Please help me..

CodePudding user response:

With re, it won't be possible to achieve what you need because you expect multiple occurrences per string that will be replaced later, and you need a variable-width lookbehind pattern support (not available in re).

You need to install the PyPi regex module by launching pip install regex in your terminal/console and then use

import regex

s = 'bb cc dd ee\naa : bb cc dd ee\n11 cc dd ee'
pp = regex.compile(r'(?P<n1>ee)|(?<!^aa\b.*)\b(?P<n2>cc dd)\b', regex.MULTILINE)

def _rep(x):
    #print(x.groupdict())
    return [f'<{k}>' for k, v in x.groupdict().items() if v is not None][0]

rr = pp.sub(_rep, s)
print(rr)

See the Python demo.

Here, (?<!^aa\b.*)\b(?P<n2>cc dd)\b matches a whole word cc dd capturing it into n2 group that is not immediately preceded with aa whole word at the beginning of the current line (regex.MULTILINE with ^ make this anchor match any line start position and .* makes sure the check is performed even if cc dd is not immediately preceded with aa.

  • Related