How to efficiently split a list that has a certain periodicity, into multiple lists?-CodePudding

For example the original list:

['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']

We want to split the list into lists started with 'a' and ended with 'a', like the following:

['a','b','c','a']

['a','d','e','a']

['a','b','e','f','j','a']

['a','c','a']

The final ouput can also be a list of lists. I have tried a double for loop approach with 'a' as the condition, but this is inefficient and not pythonic.

CodePudding user response：

One possible solution is using re (regex)

import re

l = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
r = [list(f"a{_}a") for _ in re.findall("(?<=a)[^a] (?=a)", "".join(l))]
print(r)
# [['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]

CodePudding user response：

You can do this in one loop:

lst = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']

out = [[]]
for i in lst:
    if i == 'a':
        out[-1].append(i)
        out.append([])
    out[-1].append(i)
out = out[1:] if out[-1][-1] == 'a' else out[1:-1]

Also using numpy.split:

out = [ary.tolist()   ['a'] for ary in np.split(lst, np.where(np.array(lst) == 'a')[0])[1:-1]]

Output:

[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]

CodePudding user response：

Firstly you can store the indices of 'a' from the list.

oList = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']

idx_a = list()

for idx, char in enumerate(oList):
    if char == 'a':
        idx_a.append(idx)

Then for every consecutive indices you can get the sub-list and store it in a list

ans = [oList[idx_a[x]:idx_a[x   1]   1] for x in range(len(idx_a))]

You can also get more such lists if you take in-between indices also.

CodePudding user response：

You can do this with a single iteration and a simple state machine:

original_list = list('kabcadeabefjacab')

multiple_lists = []
for c in original_list:
    if multiple_lists:
        multiple_lists[-1].append(c)
    if c == 'a':
        multiple_lists.append([c])
if multiple_lists[-1][-1] != 'a':
    multiple_lists.pop()

print(multiple_lists)

[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]

CodePudding user response：

We can use str.split() to split the list once we str.join() it to a string, and then use a f-string to add back the stripped "a"s. Note that even if the list starts/ends with an "a", this the split list will have an empty string representing the substring before the split, so our unpacking logic that discards the first last subsequences will still work as intended.

def split(data):
    _, *subseqs, _ = "".join(data).split("a")
    return [list(f"a{seq}a") for seq in subseqs]

Output:

>>> from pprint import pprint
>>> testdata = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
>>> pprint(split(testdata))
[['a', 'b', 'c', 'a'],
 ['a', 'd', 'e', 'a'],
 ['a', 'b', 'e', 'f', 'j', 'a'],
 ['a', 'c', 'a']]