Home > Back-end >  python regex to substitute all digits except when they are part of a substring
python regex to substitute all digits except when they are part of a substring

Time:10-29

I want to remove all digits, except if the digits make up one of the special substrings. In the example below, my special substring that should skip the digit removal are 1s, 2s, s4, 3s. I think I need to use a negative lookahead

s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(?!1s|2s|s4|3s)[0-9\.]"
re.sub(pattern, ' ', s)

To my understanding, the pattern above is:

  • starting from the end ([]) match all digits including decimals
  • only do that if we have not matched the patter after ?!
  • which are 1s, 2s, s4, OR 3s (| = OR)

It all makes sense until you try it. The sample s above returns a 1s sa 2s3s as s af3s, which suggests that all the exclusion patterns are working except if the digit is at the end of the special substring, in which case it still gets matched?!

I believe this operation should return a 1s sa 2s3s as4s4af3s, how to fix my pattern?

CodePudding user response:

You can use

import re
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(1s|2s|s4|3s)|[\d.]"
print( re.sub(pattern, lambda x: x.group(1) or ' ', s) )
# => a 1s sa 2s3s as4s4af3s

See the Python demo.

Details:

  • (1s|2s|s4|3s) - Group 1: 1s, 2s, s4 or 3s
  • | - or
  • [\d.] - a digit or dot.

If Group 1 matches, Group 1 value is the replacement, else, it is a space.

CodePudding user response:

Try (regex101):

import re

s = "a61s8sa92s3s3as4s4af3s"

s = re.sub(r"(?!1s|2s|3s)(?<!s(?=4))[\d.]", " ", s)
print(s)

Prints:

a 1s sa 2s3s as4s4af3s
  • Related