find sublist by using recursive generator in python-CodePudding

The nested list is like:

nested_list=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'i', ['j', 'k']]

and we can clearly see its structure:

a
- b
  - c
  - d
  - e
- f
  - g
  - h
i
- j
- k

Here is the thing I want to do: define find_sublist(keywords, nested_list) function to show if: find_sublist('a', nested_list) returns ['a','b','c','d','e','f','g','h'] all the sublist in flattend list or find_sublist('b', nested_list) returns ['b','c','d','e']

I oringinally tried:

The idea is to make a boolen flag found in the parameter list. The flag stays as False at the beginning and is set to True when the function recurs to the sublist of list. In the base case, the category can also be yield when the flag is True.

def find_sublist(keywords, nested_list, found=False):
            if type(nested_list) == list:
                for index, child in enumerate(nested_list):
                    yield from find_sublist(keywords, child, found=False)
                    if child == keywords and index 1 < len(nested_list) and type(nested_list[index 1]) == list:
                        yield from find_sublist(keywords, nested_list[index 1], found=True)
            else:
                if nested_list == keywords or found==True:
                    yield nested_list

And I the result by using this example: [i for i in find_sublist('a', nested_list)] is ['a']

And I tried another code and changed yield from find_sublist(keywords, nested_list[index 1], found=True) to yield from nested_list[index 1]

the result with the example [i for i in find_sublist('a', nested_list)] is ['a','b',['c','d','e'],'f',['g','h']]

Originally, the purpose of using yield from find_sublist(keywords, nested_list[index 1], found=True) is to recursively call the generator on the sublists with flag set to true when the target is found. I was wondering is the probelm occurred in this part. But I can't figure it out. :(

CodePudding user response：

Update Use only the generator:

>>> def flat(it):
...     for val in it:
...         if isinstance(val, list):
...             yield from flat(val)
...         else:
...             yield val
...
>>> def find_sublist(val, nested_list):
...     for front, back in pairwise(nested_list):
...         if val == front:
...             yield front
...             if isinstance(back, list):
...                 yield from flat(back)
...             break
...         if isinstance(back, list):
...             yield from find_sublist(val, back)
...
>>> list(find_sublist('a', nested_list))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> list(find_sublist('b', nested_list))
['b', 'c', 'd', 'e']

If you really want to combine them into one generator function:

def find_sublist(val, nested_list, is_find=False):
    if is_find:
        for elem in nested_list:
            if isinstance(elem, list):
                yield from find_sublist(val, elem, True)
            else:
                yield elem
        return

    for front, back in pairwise(nested_list):
        if val == front:
            yield front
            if isinstance(back, list):
                yield from find_sublist(val, back, True)
            break
        if isinstance(back, list):
            yield from find_sublist(val, back)

To be honest, I think it's not elegant. It not only increases the difficulty of coding, but also causes the loss of performance.

Old Answer First define an auxiliary function for flattening:

>>> def flatten(lst):
...     def flat(it):
...         for val in it:
...             if isinstance(val, list):
...                 yield from flat(val)
...             else:
...                 yield val
...     return list(flat(lst))
...

Then use pairwise if the version of your python is 3.10 :

>>> from itertools import pairwise
>>> def find_sublist(val, nested_list):
...     for front, back in pairwise(nested_list):
...         if val == front:
...             if isinstance(back, list):
...                 return flatten((front, back))
...             else:
...                 return [front]
...         if isinstance(back, list) and (result := find_sublist(val, back)):
...             return result
...
>>> nested_list=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'i', ['j', 'k']]
>>> find_sublist('a', nested_list)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> find_sublist('b', nested_list)
['b', 'c', 'd', 'e']

If not, you can also define it yourself:

def pairwise(iterable):
    iterator = iter(iterable)
    try:
        front = next(iterator)
    except StopIteration:
        return
    for back in iterator:
        yield front, back
        front = back

CodePudding user response：

I would split the two tasks into separate functions. One to find the sublist and one to flatten it.

I also had a few runs through the debugger with the recursive generator. I found it was continuing the for loop in earlier nested calls once it had found an answer in later nested calls. Adding the if not found: before the raise at the end resolved this

def find_sublist(keywords, nested_list: list):
    found = False
    for i, sub_element in enumerate(nested_list):
        if sub_element == keywords:
            next_element = nested_list[i 1]
            if isinstance(next_element, list): #could be more generic by checking for Sequence
                yield from [keywords]   [item for item in flatten_list(next_element)]
                found = True
            else:
                yield keywords
                found = True
        elif isinstance(sub_element, list):
            try:
                yield from find_sublist(keywords, sub_element)
                found = True
            except ValueError:
                pass
    if not found:
        raise ValueError("{kw} not found in nested_list".format(kw=keywords))


def flatten_list(list_to_flatten: list):
    for item in list_to_flatten:
        if not isinstance(item, list):
            yield item
        else:
            yield from flatten_list(item)

The above will search through the list from top to bottom and return the results from the each instance of keywords. As you said they will be unique then I assume this should be fine otherwise you can add a check for found and then break the for loop if True.

If the keyword is not followed by a sublist but rather by another element at the same level then it returns a list with one entry keywords (eg: find_sublist("d", nested_list) in your example returns ["d"]

Finally if keywords is not found you will get a ValueError

Happy to explain any of the workings in more detail, or help to adjust any specific detail if needed

Put another way. This will pass the following pytest testcases:

from pytest import raises
from .NestedSublists import find_sublist

nested_list=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'i', ['j', 'k']]

def test_a():
    assert [x for x in find_sublist('a', nested_list)] == ['a','b','c','d','e','f','g','h']

def test_b():
    assert [x for x in find_sublist('b', nested_list)] == ['b','c','d','e']

def test_b2():
    nested_list2=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], 'b', ['j', 'k']]
    assert [x for x in find_sublist('b', nested_list2)] == ['b','c','d','e', 'b', 'j', 'k']

def test_notfound():
    with raises(ValueError) as e:
        [x for x in find_sublist("z",nested_list)]
    assert str(e.value) == "z not found in nested_list"

def test_noSublist():
    nested_list2=['a', ['b', ['c', 'd', 'e'], 'f', ['g', 'h']], "p", 'b', ['j', 'k']]
    assert [x for x in find_sublist("p", nested_list2)] == ["p"]

def test_f():
    assert [x for x in find_sublist("f", nested_list)] == ["f", "g", "h"]