I am trying to replace the multiple simultaneous repetition of patterns.
Suppose following is the text with the pattern user
.
user user please sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http
Here I want such occurrence of user
in this case user user
to be replaced with user
. Condition being that the user repetition should be adjacent to each other.
For example if the sentence was:
user user user please user user sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http
I want user user user
and user user
each to be replaced with user
What I have come up with so far is this:
re.findall(r'[user\s] ',text)
I know to replace we will use re.sub
The output that I am getting is:
['user user user ',
'e',
'se user user s',
'r',
' ',
'u',
' ',
' ',
' s',
'u',
' ',
'e ',
'u',
' ',
'er ',
'rese',
' ',
'er ',
' ',
'r',
' ',
'ur',
'e ',
' ',
'eres',
' ',
' ',
'u',
' ',
' ',
' e',
'e',
' ',
' ',
'e',
'ers',
' ',
'r',
'er ',
's ',
's',
'u',
'e',
' r',
' ',
' ']
So I just want the first and the third element to be found and third element should be user user
instead of se user user s
So when you answer please could you explain how would that expression work. I am very new to regex.
CodePudding user response:
I hope I've understand your question right. This will shorten user user
to user
(even for more repetitions):
import re
s = "user user user please user user sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http"
s = re.sub(
r"(?:(\suser\b)|(\buser\s)){2,}",
lambda g: " user" if g.group(1) else "user ",
s,
)
print(s)
Prints:
user please user sort situation time bound manner present officer ha moral courage to interest of public of expect political leadership order always absolutely right thing http