Home > Blockchain >  How I can use regex to remove repeated characters from string
How I can use regex to remove repeated characters from string

Time:03-15

I have a string as follows where I tried to remove similar consecutive characters.

import re
input = "abccbcbbb";
for i in input :
    input = re.sub("(.)\\1 ", "",input);
print(input)

Now I need to let the user specify the value of k. I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not "int") to str

import re
input = "abccbcbbb";
k=3
for i in input :
   input= re.sub("(.)\\1 {" (k-1) "}", "",input)
print(input)

CodePudding user response:

If I were you, I would prefer to do it like suggested before. But since I've already spend time on answering this question here is my handmade solution.

The pattern described below creates a named group named "letter". This group updates iterative, so firstly it is a, then b, etc. Then it looks ahead for all the repetitions of the group "letter" (which updates for each letter).

So it finds all groups of repeated letters and replaces them with empty string.

import re

input = 'abccbcbbb'
result = 'abcbcb'
pattern = r'(?P<letter>[a-z])(?=(?P=letter) )'
substituted = re.sub(pattern, '', input)
assert substituted == result

CodePudding user response:

The for i in input : does not do what you need. i is each character in the input string, and your re.sub is supposed to take the whole input as a char sequence.

If you plan to match a specific amount of chars you should get rid of the quantifier after \1. The limiting {min,} / {min,max} quantifier should be placed right after the pattern it modifies.

Also, it is more convenient to use raw string literals when defining regexps.

You can use

import re
input_text = "abccbcbbb";
k=3
input_text = re.sub(fr"(.)\1{{{k-1}}}", "", input_text)
print(input_text)
# => abccbc

See this Python demo.

The fr"(.)\1{{{k-1}}}" raw f-string literal will translate into (.)\1{2} pattern. In f-strings, you need to double curly braces to denote a literal curly brace and you needn't escape \1 again since it is a raw string literal.

CodePudding user response:

Just to make sure I have the question correct you mean to turn "abccbcbbb" into "abcbcb" only removing sequential duplicate characters. Is there a reason you need to use regex? you could likely do a simple list comprehension. I mean this is a really cut and dirty way to do it but you could just put

input = "abccbcbbb"
input = list(input)
previous = input.pop(0)
result = [previous]
for letter in input:
    if letter != previous : result  = letter
    previous = letter
result = "".join(result)

and with a method like this, you could make it easier to read and faster with a bit of modification id assume.

  • Related