Home > OS >  Exclude any digits from match but keep specific digits within brackets
Exclude any digits from match but keep specific digits within brackets

Time:04-11

string: abc keyword1 ddd 111 ddd (ddd 99/ddd) 1 ddd (ddd) ddd 11 ddd keyword2 abc

regex: re.compile(r'(?:keyword1)(.*)(?:keyword2)', flags = re.DOTALL | re.MULTILINE)

goal: exclude all digits except the ones within brackets from match

desired output: 'ddd ddd (ddd 99/ddd) ddd (ddd) ddd ddd'

approach1: Any digit within brackets is always 99 but the digits outside of brackets can also be 99. That is why i could also remove every digit from matching, except 99 and subsequently use not regex to remove the remaining 99s outside of brackets?!

approach2: match ddd (basically everything including 99s) except all other digits using some variant of the help below. I played with the (\([^)]*\)|\S)* around but failed prob because its java :D

Question: Which approach makes sense? How can i modify my regex to reach my goal?

related help Exclude strings within parentheses from a regular expression? (\([^)]*\)|\S)* where one balanced set of parentheses is treated as if it were a single character, and so the regex as a whole matches a single word, where a word can contain these parenthesized groups.

CodePudding user response:

Without any additional packages, you can use a two step approach: get the string between keywords and then remove all digit chunks that are not inside parentheses:

import re
s = "abc keyword1 ddd 111 ddd (ddd 99/ddd) 1 ddd (ddd) ddd 11 ddd keyword2 abc"
m = re.search(r'keyword1(.*?)keyword2', s, re.I | re.S)
if m:
    print( re.sub(r'(\([^()]*\))|\s*\d ', r'\1', m.group(1)) )

## => ddd ddd (ddd 99/ddd) ddd (ddd) ddd ddd

See the Python demo.

Notes:

  • keyword1(.*?)keyword2 extracts all contents between keyword1 and keywor2 into Group 1
  • re.sub(r'(\([^()]*\))|\s*\d ', r'\1', m.group(1)) removes any digit chunks preceded with optional whitespace from the Group 1 value while keeping all strings between ( and ) intact.
  • Related