Remove string character after run of n characters in string-CodePudding

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row.
For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'.
I have written this crude function that compiles without errors but doesn't quite get me what I want:

def strcut(string, n):
    for i in range(len(string)):
        for j in range(n):
            if i   j < len(string)-(n-1):
                if string[i] == string[i j]:
                    beg = string[:i]
                    ends = string[i 1:]
                    string = beg   ends
    print(string)

These are the outputs for strcut('aaabccdddd', n):

n	output	expected
1	'abcdd'	'abcd'
2	'acdd'	'aabccdd'
3	'acddd'	'aaabccddd'

I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?

CodePudding user response：

This may not answer why your code does not work, but here's an alternate solution using regex:

import re
def strcut(string, n):
    return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)

How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"

(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times

The replacement string is the first capture group repeated n times

For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" "aaa" = "aaaa". That is replaced with "aaa".

regex101 can explain it better than me.

CodePudding user response：

you can implement a stack data structure.

Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1

# your code goes here
def func(string, n):
    stack = []
    counter = None
    for i in string:
        if not stack:
            counter = 1
            stack.append(i)
        elif stack[-1]==i:
            if counter 1<=n:
                stack.append(i)
                counter =1
        elif stack[-1]!=i:
            stack.append(i)
            counter = 1
        
    return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')

output

True
True
True
True

CodePudding user response：

The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :

def strcut(string,n) :

    new_string = ""
    first_c, s = string[0], 0

    for c in string :

        if c != first_c :
            first_c, s= c, 0

        s  = 1
        if s > n : continue
        else : new_string  = c 

    return new_string

print(strcut("aabcaaabbba",2))  # output : #aabcaabba

CodePudding user response：

Simply, to anwer the question

appears in the string more than n times in a row

the following code is small and simple, and will work fine :-)

def strcut(string: str, n: int) -> str:
    tmp = "*" * (n 1)
    for char in string:
        if tmp[len(tmp) - n:] != char * n:
            tmp  = char
    print(tmp[n 1:])

strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)

Output:

abcd
aabccdd
aaabccddd

Notes:

The character "*" in the line tmp = "*"*n string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.

The print(tmp[n:]) line simply removes the "*" characters added in the beginning.

CodePudding user response：

Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.

def strcut(string, n):
    for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
        if string.count(string[i]) > n:
            string = remove(string,i)
    print(string)

def remove(string, i):
    if i > len(string):
        return string[:i]
    return string[:i]   string[i 1:]

strcut('aaabccdddd',2)

CodePudding user response：

You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.

def strcut(s,n):
    result = ''                           # resulting string
    char,count  = '',0                    # initial character and count
    for c in s:                           # only loop once on the characters
        if c == char: count  = 1          # increase count
        else:         char,count = c,1    # reset character/count
        if count<=n:  result  = c         # include character if count is ok
    return result