Home > Net >  python regex to match a specific pattern
python regex to match a specific pattern

Time:11-23

I need a regex to match patterns like:

'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'

my understanding is that these anomalies are in the format: 'a aa aa a, a aa a' and if the word only has three letters then it would be 'a aa', the abovementioned are just some examples and there are a lot more words that have this weird spacing issue.

can someone help me with this? the goal is to match these patterns and remove those spaces and make them normal words. Thank you in advance.

CodePudding user response:

We can try using re.sub here along with a callback function:

inp = 'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'
output = re.sub(r'\w(?: \w{2})*(?: \w{1,2})?', lambda m: m.group().replace(' ', ''), inp)
print(output)  # Responsibilities, skills, required, sap

The strategy here is to match every x xx xx or y yy y term and then strip away spaces in the callback.

CodePudding user response:

I'm not sure I understand your problem correctly, but try this:

>>> import re
>>> text = 'R es po ns ib il it ie s, s ki ll s, r eq ui re d, s ap'
>>> re.sub(r'(\w)((?: \w\w) )( \w?\w?)?,?', 
>>>     lambda match: (match[1] match[2] (match[3] if match[3] else '')
>>> ).replace(' ', ''), text)
'Responsibilities skills required sap'

you can test regex at: https://regex101.com/r/mCjcNQ/1

  • Related