Home > Software design >  how to remove '18pieces' from '1903type18pieces' using regex?
how to remove '18pieces' from '1903type18pieces' using regex?

Time:11-05

I want to remove digit numbers and following strings till sentence end, but not numbers at sentence start.

eg. remove "18pieces" from "1903type18pieces", not the whole "1903type18pieces",the right code maybe something like :

re.sub("(?<!\d )\d. ?$", "", string)

but can't figure it out

CodePudding user response:

Well, of course you can...

    import re;
    print(re.sub("18pieces", "", "1903type18pieces"));

But it's pretty silly, and a normal replace would work fine.

How about you try to reformulate your question?

CodePudding user response:

Some notes about the pattern you tried:

Python re does not support a quantifier in a lookbehind assertion like (?<!\d ).

But like this, the pattern will still match the first digits it encounters in the string, as the lookbehind asserts that there should be not digit(s) to the left, which is also true at the start of the string.

In this part . ?$ you can omit the ? to prevent backtracking, as that part will match until the end of the string. Using the here means that the string should be at least 2 characters long


You could assert that there is a character other than a digit or whitspace char to the left using a positive lookbehind, and then match a digit an the rest of the line.

(?<=[^\s\d])\d.*

Regex demo

import re

text = "1903type18pieces"
print (re.sub("(?<=[^\s\d])\d.*", "", text))

Output

1903type

If it is ok to also assert a newline or space directly to the left, you can use:

re.sub(r"(?<=\D)\d.*", "", text)
  • Related