Home > other >  Regex to append a word after the occurrence of any word from a list
Regex to append a word after the occurrence of any word from a list

Time:09-04

Recently I got this amazing answer on how to remove a word if it occurs after any word in word list1. I wanted to remove the word eat. I was wondering if it is possible to do the reverse, add a particular word after the occurrence of any word from a given list.

list1 = ['duck', 'crow', 'hen', 'sparrow']

Input:[['Duck fish'], ['Crow veggies'], ['She lives there']]

Output:[['Duck eats fish'],['Crow eats veggies'],['She lives there']]

import re

list1 = ['duck', 'crow', 'hen', 'sparrow']

look_behinds = '|'.join(f'(?<=[{w[0].swapcase()}{w[0]}]{w[1:]})'
                        for w in list1)

EATS_RE = re.compile(rf'(?:{look_behinds})\s eats?\b')

sentences = [['The crow eats'],
             ['Hen eats blue seeds'],
             ['the duck is cute'],
             ['she eats veggies']]

repl_sentences = [[EATS_RE.sub('', s, 1) for s in x] for x in sentences]
print(repl_sentences)
OUTPUT

[['The crow'], ['Hen blue seeds'], ['the duck is cute'], ['she eats veggies']]

Is there a way to change the code to perform the reverse task? What can be the regex for the problem?

CodePudding user response:

What you're asking for is simple; just use the appropriate regex to match the particular words only, and replace it with the matched word which has "eats" appended to it:

>>> r = re.compile(r"\b([cC]row|[dD]uck)\b")
>>> r.sub('\\1 eats', "crow here") # '\\1' is the placeholder for the 1st group in our match, i.e. the bird names surrounded with capturing parentheses in our case
'crow eats here'

However, what you likely want is not to add the word if it already exists. For this to happen, we use negative lookahead for a more careful match for replacement:

>>> r = re.compile(r"\b([cC]row|[dD]uck)\b(?!\s eats\b)")
>>> r.sub('\\1 eats', "crow here")
'crow eats here'
>>> r.sub('\\1 eats', "crow eats here") # should remain unaffected
'crow eats here'

To update (reverse) your code, it will be something like this:

import re

words = ['duck', 'crow', 'hen', 'sparrow']
captured_word = '|'.join(f'[{w[0].swapcase()}{w[0]}]{w[1:]}' for w in words)

EATS_RE = re.compile(rf'\b({captured_word})\b(?!\s eats\b)')

sentences = ['The crow',
             'Hen blue seeds',
             'the duck eats too much',
             'she veggies']

[EATS_RE.sub("\\1 eats", s) for s in sentences]
# output: ['The crow eats', 'Hen eats blue seeds', 'the duck eats too much', 'she veggies']
  • Related