Home > Blockchain >  Remove a pattern if does not contains a specific words
Remove a pattern if does not contains a specific words

Time:08-16

I need to remove everything from the given text after a specific pattern if doesn't include specific words. For example, I need to remove everything after a number if doesn't include "key1" and "key2"

txt1 = "this is a number 123456789 and there aren't any keys here. we might have a lot of words here as well but no key words'

There are no key1 and key2 in this text, so, the output for txt1 should be:

out1 = "this is a number"
txt2 = "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number'

There are key1 and key2 in the above text, so, the output for txt2 should be:

out2 = "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number'

I tried to use negative lookahead as below but it didn't work.

re.sub(r'\d .*(?!key1|key2).*', '', txt)

CodePudding user response:

(?=^(?:(?!key[12]).)*$)^.*(?=\s\d )

Short Explanation

  • (?=^(?:(?!key[12]).)*$) Assert that the string does not contain neither key1 or key2
  • ^.*?(?=\s\d ) Capture the string till the digits

See the regex demo

Python Example

import re

strings = [
    "this is a number 123456789 and there aren't any keys here. we might have a lot of words here as well but no key words",
    "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number",
]

for string in strings:
    match = re.search(r"(?=^(?:(?!key[12]).)*$)^.*?(?=\s\d )", string)
    output = match.group() if match else string
    print(output)
  • Related