Matching repeating words in a row by regex-CodePudding

I would like to find a replace repeating words in the string, but only if the are next to each other or separated by a space. For example:

"<number> <number>" -> "<number>"
"<number><number>"-> "<number>"

but not

"<number> test <number>" -> "<number> test <number>"

I have tried this:

import re
re.sub(f"(. )(?=\<number> )","", label).strip()

but it would give the wrong result for the last test option.

Could you please help me with that?

CodePudding user response：

You can use

re.sub(r"(<number>)(?:\s*<number>) ",r"\1", label).strip()\

See the regex demo. Details:

(<number>) - Group 1: a <number> string
(?:\s*<number>) - one or more occurrences of the following sequence of patterns:
- \s* - zero or more whitespaces
- <number> - a <number> string

The \1 is the replacement backreference to the Group 1 value.

Python test:

import re
text = '"<number> <number>", "<number><number>", not "<number> test <number>"'
print( re.sub(r"(<number>)(?:\s*<number>) ", r'\1', text) )
# => "<number>", "<number>", not "<number> test <number>"

CodePudding user response：

You can use

(<number>\s*){2,}

(<number>\s*) Capture group 1, match <number> followed by optional chars
{2,} Repeat 2 or more times

In the replacement use group 1.

Regex demo

import re

strings = [
    "<number> <number>",
    "<number><number>",
    "not <number> test <number>",
    " <number>   <number><number>  <number>     test"
]

for s in strings:
    print(re.sub(r"(<number>\s*){2,}", r"\1", s))

Output

<number>
<number>
not <number> test <number>
 <number>     test