Splitting a string and retaining the delimiter with the delimiter appearing contiguously-CodePudding

I have the following string:

bar = 'F9B2Z1F8B30Z4'

I have a function foo that splits the string on F, then adds back the F delimiter.

def foo(my_str):
    res = ['F'   elem for elem in my_str.split('F') if elem != '']
    return res

This works unless there are two "F"s back-to-back in the string. For example,

foo('FF9B2Z1F8B30Z4')

returns

['F9B2Z1', 'F8B30Z4']

(the double "F" at the start of the string is not processed)

I'd like the function to split on the first "F" and add it to the list, as follows:

['F', 'F9B2Z1', 'F8B30Z4']

If there is a double "F" in the middle of the string, then the desired behavior would be:

foo('F9B2Z1FF8B30Z4')

['F9B2Z1', 'F', 'F8B30Z4']

Any help would be greatly appreciated.

CodePudding user response：

Instead of the filtering if, use slicing instead because an empty string is a problem only at the beginning:

def foo(my_str):
    res = ['F'   elem for elem in my_str.split('F')][1:]
    return res

Output:

>>> foo('FF9B2Z1F8B30Z4')
['F', 'F9B2Z1', 'F8B30Z4']

>>> foo('FF9B2Z1FF8B30Z4FF')
['F', 'F9B2Z1', 'F', 'F8B30Z4', 'F', 'F']

CodePudding user response：

Using regex it can be done with

import re

pattern = r'^[^F] |(?<=F)[^F]*'

The ^[^F] captures all characters at the beginning of strings that do not start with F.

(?<=F)[^F]* captures anything following an F so long as it is not an F character including empty matches.

>>> print(['F'   x for x in re.findall(pattern, 'abcFFFAFF')])
['Fabc', 'F', 'F', 'FA', 'F', 'F']

>>> print(['F'   x for x in re.findall(pattern, 'FFabcFA')])
['F', 'Fabc', 'FA']

>>> print(['F'   x for x in re.findall(pattern, 'abc')])
['Fabc']

Note that this returns nothing for empty strings. If empty strings need to return ['F'] then pattern can be changed to pattern = r'^[^F] |(?<=F)[^F]*|^$' adding ^$ to capture empty strings.