how to split after each word and get the following string in an organized way?-CodePudding

Given the following string:

'hello0192239world0912903spam209394'

I would like to be able to split the above string into this

hello, 0192239, world, 0912903, spam, 209394

and ideally end with a list:

[hello, 0192239], [world, 0912903], [spam, 209394]

But I just don't know how to go about even the first step, splitting by word x number. I know there's the split method and something called regex but I don't know how to use it and even if it's the right thing to use

CodePudding user response：

Try this:

>>> lst = re.split('(\d )','hello0192239world0912903spam209394')
>>> list(zip(lst[::2],lst[1::2]))
[('hello', '0192239'), ('world', '0912903'), ('spam', '209394')]

>>> lst = re.split('(\d )','09182hello2349283world892')
>>> list(zip(lst[::2],lst[1::2]))
[('', '09182'), ('hello', '2349283'), ('world', '892')]

# as a list
>>> list(map(list,zip(lst[::2],lst[1::2])))
[['', '09182'], ['hello', '2349283'], ['world', '892']]

CodePudding user response：

See below. The idea is to maintain a 'mode' and flip mode every time you switch from digit to char or the other way around.

data = 'hello0192239world0912903spam209394'
A = 'A'
D = 'D'
mode = D if data[0].isdigit() else A
holder = []
tmp = []
for x in data:
    if mode == A:
        is_digit = x.isdigit()
        if is_digit:
            mode = D
            holder.append(''.join(tmp))
            tmp = [x]
            continue
    else:
        is_char = not x.isdigit()
        if is_char:
            mode = A
            holder.append(''.join(tmp))
            tmp = [x]
            continue
    tmp.append(x)
holder.append(''.join(tmp))
print(holder)

output

['hello', '0192239', 'world', '0912903', 'spam', '209394']