I would like to find a replace repeating words in the string, but only if the are next to each other or separated by a space. For example:
"<number> <number>" -> "<number>"
"<number><number>"-> "<number>"
but not
"<number> test <number>" -> "<number> test <number>"
I have tried this:
import re
re.sub(f"(. )(?=\<number> )","", label).strip()
but it would give the wrong result for the last test option.
Could you please help me with that?
CodePudding user response:
You can use
re.sub(r"(<number>)(?:\s*<number>) ",r"\1", label).strip()\
See the regex demo. Details:
(<number>)
- Group 1: a<number>
string(?:\s*<number>)
- one or more occurrences of the following sequence of patterns:\s*
- zero or more whitespaces<number>
- a<number>
string
The \1
is the replacement backreference to the Group 1 value.
import re
text = '"<number> <number>", "<number><number>", not "<number> test <number>"'
print( re.sub(r"(<number>)(?:\s*<number>) ", r'\1', text) )
# => "<number>", "<number>", not "<number> test <number>"
CodePudding user response:
You can use
(<number>\s*){2,}
(<number>\s*)
Capture group 1, match<number>
followed by optional chars{2,}
Repeat 2 or more times
In the replacement use group 1.
import re
strings = [
"<number> <number>",
"<number><number>",
"not <number> test <number>",
" <number> <number><number> <number> test"
]
for s in strings:
print(re.sub(r"(<number>\s*){2,}", r"\1", s))
Output
<number>
<number>
not <number> test <number>
<number> test