Remove digits between two fullstops-CodePudding

Is there any way to remove digits between two full stops in python?

eg:

Input 1: "remove 1 from .1."
Output 1: "remove 1 from."
Input 2: "XYZ is a student.2. XYZ is a boy.3. XYZ is smart."
Output 2: "XYZ is a student. XYZ is a boy. XYZ is smart."

I've tried the following regex but didn't get the preferred output.

output = re.sub(r'([^A-Z].[0-9]) )', input)

CodePudding user response：

You may try doing a replacement on \s*\.\d \. and then just replace with single full stop.

inp = ["remove 1 from .1.", "XYZ is a student.2. XYZ is a boy.3. XYZ is smart."]
output = [re.sub(r'\s*\.\d \.', '.', x) for x in inp]
print(output)

This prints:

['remove 1 from.', 'XYZ is a student. XYZ is a boy. XYZ is smart.']

CodePudding user response：

There are a few noticeable things in your code.

Using re.sub requires 3 arguments, where you have provided 2.
Avoid naming your variable input
The pattern in your example ([^A-Z].[0-9]) ) is not a valid pattern as there is an unmatched parenthesis at the end.

If you remove that, you have this pattern [^A-Z].[0-9] which matches a single char other than A-Z, a dot that matches any character and a digit.

That means that the pattern can match a lot more than than intended.

If you don't want to for example change an ip number or a float, you can assert that there is no digit before the match (And note to escape the dot to match it literally)

The pattern is the same as posted by @Tim Biegeleisen only with a leading negative lookbehind to assert no leading digit.

(?<!\d)\s*\.\d \.

Regex demo

Example

import re

strings = ["remove 1 from .1.", "XYZ is a student.2. XYZ is a boy.3. XYZ is smart.", "test 127.0.0.1 test"]

for s in strings:
    print(re.sub(r'(?<!\d)\s*\.\d \.', '.', s))

Output

remove 1 from.
XYZ is a student. XYZ is a boy. XYZ is smart.
test 127.0.0.1 test