I'm trying to standardise a set of text files that contain a list of values separated by pipes ("|") which currently have varying decimal places for values with trailing zeros.
This seems like something that a regex should be able to handle but I'm struggling with where to start. I've found examples where I can replace any values between two sets of substrings, but not an example where it only matches when the values between are all the same character.
The Python code below shows a minimal example of what I'm trying to achieve (where the assert should pass if the replacement is successful). Any help would be much appreciated.
import re
str_in = '4|5|0.00000000|'
expected_str_out = '4|5|0.0|'
str_out = re.sub('0.(.*?)\|', '0.0|', s)
assert str_out == expected_str_out
CodePudding user response:
You can use
import re
str_in = '4|5|0.00000000|'
expected_str_out = '4|5|0.0|'
str_out = re.sub(r'(?<![^|])0 \.0 (?![^|])', '0.0', str_in)
print( str_out == expected_str_out )
See the online Python demo and the regex demo.
The regex matches
(?<![^|])
- start of string or a|
0 \.0
- one or more0
chars,.
and one or more0
s(?![^|])
- an end of string, or|
.
In case you need to handle cases like 2.2222
, 333.333
, 5555555.55
you can use
(?<![^|])(\d)\1*\.\1 (?![^|])
Replace with \1.\1
, see the regex demo.