Home > Software design >  How to add missing spaces after periods using regex, without changing decimals
How to add missing spaces after periods using regex, without changing decimals

Time:12-17

I have a large piece of text that is missing spaces after some of the periods. However the text also contains decimal numbers.

Here's what I have so far to fix the problem using regex (I'm using python):

re.sub(r"(?!\d\.\d)(?!\. )\.", '. ', my_string)

But the first escape group doesn't seem to work. It still matches periods in decimal numbers.

Here is sample text to make sure any potential solution works:

this is a.match
this should also match.1234
and this should 123.match

this should NOT match. Has space after period
this also should NOT match 1.23

CodePudding user response:

You can use

re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text)

See the regex demo. The trailing space is matched optionally, so if it is there, it will be removed and put back.

Details

  • \. - a dot
  • (?!(?<=\d\.)\d) - do not match any further if the dot before was a dot between two digit
  • ? - an optional space.

See a Python demo:

import re
text = "this is a.match\nthis should also match.1234\nand this should 123.match\n\nthis should NOT match. Has space after period\nthis also should NOT match 1.23"
print(re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text))

Output:

this is a. match
this should also match. 1234
and this should 123. match

this should NOT match. Has space after period
this also should NOT match 1.23

Alternatively, use a (?! ) lookahead as in your attempt:

re.sub(r'\.(?!(?<=\d\.)\d)(?! )', '. ', text)

See the regex demo and the Python demo.

CodePudding user response:

Another way.. not sure if this is better or worse for performance than Wiktor's solution.

re.sub(r"(?!\d\.\d)(?!.\. )(.\.)(.)", r"\1 \2", my_string)
  • Related