How to split strings with double letters? (Python)-CodePudding

I have been thinking about it for a while now and decided to ask for your help. For instance I have a string "abcdefggarfse" or "abcdeefgh" My problem is that I would like to split these string at that point where the letters are doubled. "abcdefggarfse" - > "abcdefg" and "garfse" "abcdeefgh" - > "abcde" and "efgh"

Thanks a lot!

CodePudding user response：

s1="abcdefggarfse"
s2= "abcdeefgh"
s3="abcdefgggarffse"
s4= "abcdeeefgh"

def split_string(string):
    tokens = []
    base_delimiter = 0
    for i in range(len(string) - 1):
        if string[i] == string[i 1]:
            tokens.append(string[base_delimiter:i 1])
            base_delimiter = i   1 
    tokens.append(string[base_delimiter:])
    return tokens


if __name__ == '__main__':
    l = split_string(s1)
    print(l)

    l = split_string(s2)
    print(l)
        
    l = split_string(s3)
    print(l)
    l = split_string(s4)
    print(l)

This produces:

['abcdefg', 'garfse']
['abcde', 'efgh']
['abcdefg', 'g', 'garf', 'fse']
['abcde', 'e', 'efgh']

I don't know if it is the expected behaviour for 3 or more repetitions, but this can detect either multiple doubled letters.

CodePudding user response：

Iterate through the String and find the index where the letter repeating.And you can simply use slice operation.

a = "abcdefggarfse"
for i in range(0,len(a) -1):
    if a[i] == a[i 1]:
       pos = i 1
       break

pos1, pos2  = a[0:pos], a[pos:]

OutPut

pos1 = 'abcdefg'

post2 = 'garfse'

CodePudding user response：

This functionality is not available in the in-built split method so the simplest option will be to use a for loop to find the double characters and slicing the get the output strings.

Assuming that the string needs to be split exactly once (into two output strings), the following will do the job:

input_ = "abcdefggarfse"

for i in range(len(input_) - 1):
    if input_[i] == input_[i 1]:
        output1 = input_[:i 1]
        output2 = input_[i 1:]
        break

print (output1)
print (output2)

Output is:

abcdefg
garfse

You may need to modify the code to put the output into a list, handle multiple splits or strings with no splits, etc.

Another option is using regular expressions but if you have not used them before them the above approach is the simplest.

CodePudding user response：

You can create a function that loops over the word and keeps track of the previous character seen.

def split_rep(word):
    prev = None
    for idx, char in enumerate(word):
        if char == prev:
            return word[:idx], word[idx:]
        else:
            prev = char
    return word, None

split_rep("abcdefggarfse")
('abcdefg', 'garfse')

split_rep("abcdeefgh")
('abcde', 'efgh')

split_rep("abcdefgh")
('abcdefgh', None)