Home > Mobile >  Regex to match values between substrings where the middle values are all the same character
Regex to match values between substrings where the middle values are all the same character

Time:05-18

I'm trying to standardise a set of text files that contain a list of values separated by pipes ("|") which currently have varying decimal places for values with trailing zeros.

This seems like something that a regex should be able to handle but I'm struggling with where to start. I've found examples where I can replace any values between two sets of substrings, but not an example where it only matches when the values between are all the same character.

The Python code below shows a minimal example of what I'm trying to achieve (where the assert should pass if the replacement is successful). Any help would be much appreciated.

import re

str_in = '4|5|0.00000000|'
expected_str_out = '4|5|0.0|'

str_out = re.sub('0.(.*?)\|', '0.0|', s)
assert str_out == expected_str_out

CodePudding user response:

You can use

import re

str_in = '4|5|0.00000000|'
expected_str_out = '4|5|0.0|'

str_out = re.sub(r'(?<![^|])0 \.0 (?![^|])', '0.0', str_in)
print( str_out == expected_str_out )

See the online Python demo and the regex demo.

The regex matches

  • (?<![^|]) - start of string or a |
  • 0 \.0 - one or more 0 chars, . and one or more 0s
  • (?![^|]) - an end of string, or |.

In case you need to handle cases like 2.2222, 333.333, 5555555.55 you can use

(?<![^|])(\d)\1*\.\1 (?![^|])

Replace with \1.\1, see the regex demo.

  • Related