I have a large piece of text that is missing spaces after some of the periods. However the text also contains decimal numbers.
Here's what I have so far to fix the problem using regex (I'm using python):
re.sub(r"(?!\d\.\d)(?!\. )\.", '. ', my_string)
But the first escape group doesn't seem to work. It still matches periods in decimal numbers.
Here is sample text to make sure any potential solution works:
this is a.match
this should also match.1234
and this should 123.match
this should NOT match. Has space after period
this also should NOT match 1.23
CodePudding user response:
You can use
re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text)
See the regex demo. The trailing space is matched optionally, so if it is there, it will be removed and put back.
Details
\.
- a dot(?!(?<=\d\.)\d)
- do not match any further if the dot before was a dot between two digit?
- an optional space.
See a Python demo:
import re
text = "this is a.match\nthis should also match.1234\nand this should 123.match\n\nthis should NOT match. Has space after period\nthis also should NOT match 1.23"
print(re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text))
Output:
this is a. match
this should also match. 1234
and this should 123. match
this should NOT match. Has space after period
this also should NOT match 1.23
Alternatively, use a (?! )
lookahead as in your attempt:
re.sub(r'\.(?!(?<=\d\.)\d)(?! )', '. ', text)
See the regex demo and the Python demo.
CodePudding user response:
Another way.. not sure if this is better or worse for performance than Wiktor's solution.
re.sub(r"(?!\d\.\d)(?!.\. )(.\.)(.)", r"\1 \2", my_string)