The nested list is like:
nested_list=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'i', ['j', 'k']]
and we can clearly see its structure:
- a
- b
- c
- d
- e
- f
- g
- h
- b
- i
- j
- k
Here is the thing I want to do: define find_sublist(keywords, nested_list) function to show if:
find_sublist('a', nested_list)
returns
['a','b','c','d','e','f','g','h']
all the sublist in flattend list
or
find_sublist('b', nested_list)
returns
['b','c','d','e']
I oringinally tried:
The idea is to make a boolen flag found in the parameter list. The flag stays as False at the beginning and is set to True when the function recurs to the sublist of list. In the base case, the category can also be yield when the flag is True.
def find_sublist(keywords, nested_list, found=False):
if type(nested_list) == list:
for index, child in enumerate(nested_list):
yield from find_sublist(keywords, child, found=False)
if child == keywords and index 1 < len(nested_list) and type(nested_list[index 1]) == list:
yield from find_sublist(keywords, nested_list[index 1], found=True)
else:
if nested_list == keywords or found==True:
yield nested_list
And I the result by using this example:
[i for i in find_sublist('a', nested_list)]
is ['a']
And I tried another code and changed yield from find_sublist(keywords, nested_list[index 1], found=True)
to yield from nested_list[index 1]
the result with the example [i for i in find_sublist('a', nested_list)]
is ['a','b',['c','d','e'],'f',['g','h']]
Originally, the purpose of using yield from find_sublist(keywords, nested_list[index 1], found=True)
is to recursively call the generator on the sublists with flag set to true when the target is found.
I was wondering is the probelm occurred in this part. But I can't figure it out. :(
CodePudding user response:
Update Use only the generator:
>>> def flat(it):
... for val in it:
... if isinstance(val, list):
... yield from flat(val)
... else:
... yield val
...
>>> def find_sublist(val, nested_list):
... for front, back in pairwise(nested_list):
... if val == front:
... yield front
... if isinstance(back, list):
... yield from flat(back)
... break
... if isinstance(back, list):
... yield from find_sublist(val, back)
...
>>> list(find_sublist('a', nested_list))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> list(find_sublist('b', nested_list))
['b', 'c', 'd', 'e']
If you really want to combine them into one generator function:
def find_sublist(val, nested_list, is_find=False):
if is_find:
for elem in nested_list:
if isinstance(elem, list):
yield from find_sublist(val, elem, True)
else:
yield elem
return
for front, back in pairwise(nested_list):
if val == front:
yield front
if isinstance(back, list):
yield from find_sublist(val, back, True)
break
if isinstance(back, list):
yield from find_sublist(val, back)
To be honest, I think it's not elegant. It not only increases the difficulty of coding, but also causes the loss of performance.
Old Answer First define an auxiliary function for flattening:
>>> def flatten(lst):
... def flat(it):
... for val in it:
... if isinstance(val, list):
... yield from flat(val)
... else:
... yield val
... return list(flat(lst))
...
Then use pairwise
if the version of your python is 3.10 :
>>> from itertools import pairwise
>>> def find_sublist(val, nested_list):
... for front, back in pairwise(nested_list):
... if val == front:
... if isinstance(back, list):
... return flatten((front, back))
... else:
... return [front]
... if isinstance(back, list) and (result := find_sublist(val, back)):
... return result
...
>>> nested_list=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'i', ['j', 'k']]
>>> find_sublist('a', nested_list)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> find_sublist('b', nested_list)
['b', 'c', 'd', 'e']
If not, you can also define it yourself:
def pairwise(iterable):
iterator = iter(iterable)
try:
front = next(iterator)
except StopIteration:
return
for back in iterator:
yield front, back
front = back
CodePudding user response:
I would split the two tasks into separate functions. One to find the sublist and one to flatten it.
I also had a few runs through the debugger with the recursive generator. I found it was continuing the for
loop in earlier nested calls once it had found an answer in later nested calls. Adding the if not found:
before the raise
at the end resolved this
def find_sublist(keywords, nested_list: list):
found = False
for i, sub_element in enumerate(nested_list):
if sub_element == keywords:
next_element = nested_list[i 1]
if isinstance(next_element, list): #could be more generic by checking for Sequence
yield from [keywords] [item for item in flatten_list(next_element)]
found = True
else:
yield keywords
found = True
elif isinstance(sub_element, list):
try:
yield from find_sublist(keywords, sub_element)
found = True
except ValueError:
pass
if not found:
raise ValueError("{kw} not found in nested_list".format(kw=keywords))
def flatten_list(list_to_flatten: list):
for item in list_to_flatten:
if not isinstance(item, list):
yield item
else:
yield from flatten_list(item)
The above will search through the list from top to bottom and return the results from the each instance of keywords
. As you said they will be unique then I assume this should be fine otherwise you can add a check for found
and then break
the for
loop if True
.
If the keyword
is not followed by a sublist but rather by another element at the same level then it returns a list with one entry keywords
(eg: find_sublist("d", nested_list)
in your example returns ["d"]
Finally if keywords
is not found you will get a ValueError
Happy to explain any of the workings in more detail, or help to adjust any specific detail if needed
Put another way. This will pass the following pytest testcases:
from pytest import raises
from .NestedSublists import find_sublist
nested_list=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'i', ['j', 'k']]
def test_a():
assert [x for x in find_sublist('a', nested_list)] == ['a','b','c','d','e','f','g','h']
def test_b():
assert [x for x in find_sublist('b', nested_list)] == ['b','c','d','e']
def test_b2():
nested_list2=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'b', ['j', 'k']]
assert [x for x in find_sublist('b', nested_list2)] == ['b','c','d','e', 'b', 'j', 'k']
def test_notfound():
with raises(ValueError) as e:
[x for x in find_sublist("z",nested_list)]
assert str(e.value) == "z not found in nested_list"
def test_noSublist():
nested_list2=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], "p", 'b', ['j', 'k']]
assert [x for x in find_sublist("p", nested_list2)] == ["p"]
def test_f():
assert [x for x in find_sublist("f", nested_list)] == ["f", "g", "h"]