Home > front end >  Replace variable amounts of group
Replace variable amounts of group

Time:10-25

I have the string "The dog has 12.345 bones". I want to match 12.345 and replace its . with XYZ such that the string becomes "The dog has 12XYZ345 bones". The number could be any valid number which has thousand-dots, so 1, 456, 1.000 or 34.234.233. E.g., 100.00 is not valid. How would I do that?

For internet addresses I used

address_pattern = r"(www).([A-Za-z0-9]*)\.(de|com|org)"
re.sub(address_pattern, r"\XYZ\2XYZ\3", text)

but the issue is, numbers can be as long as they want, I don't have an exact amount of groups to replace with.

CodePudding user response:

Use

import re
regex = r"(?<!\S)\d{1,3}(?:\.\d{3})*(?!\S)"
test_str = "The dog has 12.345 bones"
print(re.sub(regex, lambda m: m.group().replace('.','XYZ'), test_str))

Results: The dog has 12XYZ345 bones

See Python proof. Periods are replaced inside matched numbers with lambda m: m.group().replace('.','XYZ').

EXPRESSION EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \d{1,3}                  digits (0-9) (between 1 and 3 times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead

CodePudding user response:

If you want to actually replace . only when used as a thousands separator, you can do:

(?:<\D|^)\d{1,3}(?:\.\d{3}) (?=[^\d.]|$)

Demo

Python demo:

import re

txt='''
1
456
1.000
34.234.233
100.00
'''

print(
    re.sub(r'(?:<\D|^)\d{1,3}(?:\.\d{3}) (?=[^\d.]|$)', 
        lambda m: m.group(0).replace('.', 'XYZ'), 
        txt, flags=re.M)
)

Prints:

1
456
1XYZ000
34XYZ234XYZ233
100.00
  • Related