lets say I have this string : Sayy Hellooooooo
if N = 2
I want the result to be (Using Regex): Sayy Helloo
Thank U in advance
CodePudding user response:
Another option is to use re.sub
with a callback:
N = 2
result = re.sub(r'(.)\1 ', lambda m: m.group(0)[:N], your_string)
CodePudding user response:
You could build the regex dynamically for a given n
, and then call sub
without callback:
import re
n = 2
regex = re.compile(rf"((.)\2{{{n-1}}})\2 ")
s = "Sayy Hellooooooo"
print(regex.sub(r"\1", s)) # Sayy Helloo
Explanation:
{{
: this double brace represents a literal brace in an f-string{n-1}
injects the value ofn-1
, so together with the additional (double) brace-wrap, this{{{n-1}}}
produces{2}
whenn
is 3.- The outer capture group captures the maximum allowed repetition of a character
- The additional
\2
captures more subsequent occurrences of that same character, so these are the characters that need removal. - The replacement with
\1
thus reproduces the allowed repetition, but omits the additional repetition of that same character.
CodePudding user response:
You could use backreferences to mach the previous character. So (a|b)\1
would match aa
or bb
. In your case you would want probably any letter and any number of repetitions so ([a-zA-Z])\1{n,}
for N repetitions. Then substitute it with one occurence using \1
again. So putting it all together:
import re
n=2
expression = r"([a-zA-Z])\1{" str(n) ",}"
print(re.sub(expression,r"\1","hellooooo friiiiiend"))
# Outputs Hello friend
Note this actually matches N 1 repetitions only, like your test cases. One item then N copies of it. If you want to match exactly N also subtract 1.
Remember to use r
in front of regular expressions so you don't need to double escape backslashes.
Learn more about backreferences: https://www.regular-expressions.info/backref.html Learn more about repetition: https://www.regular-expressions.info/repeat.html
CodePudding user response:
You need a regex that search for multiple occurence of the same char, that is done with (.)\1
(the \1
matches the group 1 (in the parenthesis))
To match
- 2 occurences :
(.)\1
- 3 occurences :
(.)\1\1
or(.)\1{2}
- 4 occurences :
(.)\1\1\1
or(.)\1{3}
So you can build it with an f-string and the value you want (that's a bit ugly because you have literal brackets that needs to be escaped using double brackets, and inside that the bracket to allow the value itself)
def remove_letters(value: str, count: int):
return re.sub(rf"(.)\1{{{count}}}", "", value)
print(remove_letters("Sayy Hellooooooo", 1)) # Sa Heo
print(remove_letters("Sayy Hellooooooo", 2)) # Sayy Hello
print(remove_letters("Sayy Hellooooooo", 3)) # Sayy Hellooo
You may understand the pattern creation easier with that
r"(.)\1{" str(count) "}"
CodePudding user response:
This seems to work:
- When
N=2
: the regex pattern is compiled to :((\w)\2{2,})
- When
N=3
: the regex pattern is compiled to :((\w)\2{3,})
Code:
import re
N = 2
p = re.compile(r"((\w)\2{" str(N) r",})")
text = "Sayy Hellooooooo"
matches = p.findall(text)
for match in matches:
text = re.sub(match[0], match[1]*N, text)
print(text)
Output:
Sayy Helloo
Note:
Also tested with N=3
, N=4
and other text inputs.