I have a list like the following:
lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
and my desired result is to split the list into sublists like this:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a','a'],['start','b','b','b','end'],['a','a','a','a'],['start','b','b','end']]
so start and end are keywords, is there anyway you can use .split() by using particular keywords/if it matches?
So far I have made a function which finds the indices of 'start' i.e. starting_ind = [3, 9, 18]
and ending_ind = [5, 13, 21]
however if I do
temp=[]
for i in range(len(starting_ind)):
x = lst[starting_ind[i]: ending_ind[i]]
temp = x
print(temp)
the result is incorrect.
CodePudding user response:
This solution doesn't require you to calculate indices beforehand:
lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a', 'a', 'start', 'b', 'b',
'b', 'end', 'a', 'a', 'a', 'a', 'start', 'b', 'b', 'end', 'a', 'a', 'a']
result = []
sublist = []
for el in range(len(lst)):
if lst[el] == 'start':
result.append(sublist.copy())
sublist.clear()
sublist.append(lst[el])
else:
sublist.append(lst[el])
if lst[el] == 'end':
result.append(sublist.copy())
sublist.clear()
if el == len(lst) - 1:
result.append(sublist)
print(result)
The result is:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end'], ['a', 'a', 'a']]
CodePudding user response:
Here's a possible way to use regular expression to extract the patterns, please check if it's acceptable:
import re
lst = ['a','a','a', 'start','b','end', 'a','a','a', 'start','b','b','b','end', 'a','a','a','a', 'start','b','b','end']
result = []
for e in re.findall('a_[a_] |start[_b] _end', '_'.join(lst)):
result.append(e.strip('_').split('_'))
print(result)
Output is as desired:
[['a', 'a', 'a'],
['start', 'b', 'end'],
['a', 'a', 'a'],
['start', 'b', 'b', 'b', 'end'],
['a', 'a', 'a', 'a'],
['start', 'b', 'b', 'end']]
A better way is this:
result = []
for e in re.split(r'(start[_b] _end)', '_'.join(lst)):
result.append(e.strip('_').split('_'))
print([x for x in result if x != ['']])
Same output
CodePudding user response:
You can write so:
lst = ['a', 'a', 'a', 'start', 'b', 'end',
'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
temp=[]
ind = [0, 3, 6, 9, 14, 18, 22]
for i in range(len(ind)-1):
x = lst[ind[i]: ind[i 1]]
temp.append(x)
print(temp)
and you will get:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end']]
CodePudding user response:
If you can be certain that your keywords will always appear in pairs, and in the right order (i.e. there will never be a 'start'
without an 'end'
that follows it, at some point in the list), this should work:
l = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
def get_sublist(l):
try:
return l[:l.index('end') 1] if l.index('start') == 0 else l[:l.index('start')]
except ValueError:
return l
result = []
while l:
sublist = get_sublist(l)
result.append(sublist)
l = l[len(sublist):]
print(result)
Gives the following result:
[['a', 'a', 'a'],
['start', 'b', 'end'],
['a', 'a', 'a'],
['start', 'b', 'b', 'b', 'end'],
['a', 'a', 'a', 'a'],
['start', 'b', 'b', 'end']]