Is there any way to remove digits between two full stops in python?
eg:
Input 1: "remove 1 from .1."
Output 1: "remove 1 from."
Input 2: "XYZ is a student.2. XYZ is a boy.3. XYZ is smart."
Output 2: "XYZ is a student. XYZ is a boy. XYZ is smart."
I've tried the following regex but didn't get the preferred output.
output = re.sub(r'([^A-Z].[0-9]) )', input)
CodePudding user response:
You may try doing a replacement on \s*\.\d \.
and then just replace with single full stop.
inp = ["remove 1 from .1.", "XYZ is a student.2. XYZ is a boy.3. XYZ is smart."]
output = [re.sub(r'\s*\.\d \.', '.', x) for x in inp]
print(output)
This prints:
['remove 1 from.', 'XYZ is a student. XYZ is a boy. XYZ is smart.']
CodePudding user response:
There are a few noticeable things in your code.
Using re.sub requires 3 arguments, where you have provided 2.
Avoid naming your variable
input
The pattern in your example
([^A-Z].[0-9]) )
is not a valid pattern as there is an unmatched parenthesis at the end.
If you remove that, you have this pattern [^A-Z].[0-9]
which matches a single char other than A-Z, a dot that matches any character and a digit.
That means that the pattern can match a lot more than than intended.
If you don't want to for example change an ip number or a float, you can assert that there is no digit before the match (And note to escape the dot to match it literally)
The pattern is the same as posted by @Tim Biegeleisen only with a leading negative lookbehind to assert no leading digit.
(?<!\d)\s*\.\d \.
Example
import re
strings = ["remove 1 from .1.", "XYZ is a student.2. XYZ is a boy.3. XYZ is smart.", "test 127.0.0.1 test"]
for s in strings:
print(re.sub(r'(?<!\d)\s*\.\d \.', '.', s))
Output
remove 1 from.
XYZ is a student. XYZ is a boy. XYZ is smart.
test 127.0.0.1 test