Home > Enterprise >  Regex removes certain words from my string - Python
Regex removes certain words from my string - Python

Time:10-13

The below code is to lookup a dictionary and replace string with values corresponding to dict's key.

d = {"lh": "left hand"}
sentence = "lh l.h. lh. -lh- l.h .lh plh phli lhp 1lh lh1"
pattern_replace = r'(?<!(\w|\d))(\.)?({})(\.)?(?!(\w|\d))'.format('|'.join(sorted(re.escape(k) for k in d)))
sentence = re.sub(pattern_replace, lambda m: d.get(m.group(0)), sentence, flags=re.IGNORECASE)
sentence

Can someone help me understand why my code omits certain words?

It removes lh preceeded and followed with a . i.e., lh. and .lh. How to overcome this?

I get the output left hand l.h. -left hand- l.h plh phli lhp 1lh lh1

CodePudding user response:

Because in the lookup dict you need to get capture group 3 instead of the whole match with m.group(0)

Note that \w also matches \d.

Now your pattern looks like:

(?<!(\w|\d))(\.)?(lh)(\.)?(?!(\w|\d))

But you can rewrite the structure of the pattern to just use group 1 m.group(1) for the dict key:

(?<!\w)\.?(lh)\.?(?!\w)
           ^^ 
           dict key

Regex demo

  • Related