I have the string "The dog has 12.345 bones"
. I want to match 12.345
and replace its .
with XYZ
such that the string becomes "The dog has 12XYZ345 bones"
. The number could be any valid number which has thousand-dots, so 1
, 456
, 1.000
or 34.234.233
. E.g., 100.00
is not valid. How would I do that?
For internet addresses I used
address_pattern = r"(www).([A-Za-z0-9]*)\.(de|com|org)"
re.sub(address_pattern, r"\XYZ\2XYZ\3", text)
but the issue is, numbers can be as long as they want, I don't have an exact amount of groups to replace with.
CodePudding user response:
Use
import re
regex = r"(?<!\S)\d{1,3}(?:\.\d{3})*(?!\S)"
test_str = "The dog has 12.345 bones"
print(re.sub(regex, lambda m: m.group().replace('.','XYZ'), test_str))
Results: The dog has 12XYZ345 bones
See Python proof. Periods are replaced inside matched numbers with lambda m: m.group().replace('.','XYZ')
.
EXPRESSION EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\d{1,3} digits (0-9) (between 1 and 3 times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-ahead
CodePudding user response:
If you want to actually replace .
only when used as a thousands separator, you can do:
(?:<\D|^)\d{1,3}(?:\.\d{3}) (?=[^\d.]|$)
Python demo:
import re
txt='''
1
456
1.000
34.234.233
100.00
'''
print(
re.sub(r'(?:<\D|^)\d{1,3}(?:\.\d{3}) (?=[^\d.]|$)',
lambda m: m.group(0).replace('.', 'XYZ'),
txt, flags=re.M)
)
Prints:
1
456
1XYZ000
34XYZ234XYZ233
100.00