Remove does not work as intended for dictionary with nested list-CodePudding

I have this dictionary with a key, value pair where the value is a nested list. Based on the condition if the key is not found within a list in value object, I'd like to remove it.

What I've done is:

for key, value in example.items():
    for val in value:
        if key not in val:
            value.remove(val)

What I don't understand is, why has this worked for the first key, val pair but not the second? As below...

example = {"a": [["a", "b", "c", "d"],
                 ["e", "f", "g", "h"],
                 ["a", "i", "j", "k"],
                 ["f", "y", "a", "q"],
                 ["a", "b", "c", "d"],
                 ["e", "f", "b", "h"],
                 ["a", "i", "j", "k"],
                 ["o", "p", "a", "l"]],
           "b": [["a", "b", "c", "d"],
                 ["e", "f", "b", "h"],
                 ["a", "i", "j", "k"],
                 ["o", "p", "a", "l"],
                 ["a", "b", "c", "d"],
                 ["e", "f", "g", "h"],
                 ["a", "i", "j", "k"],
                 ["f", "y", "a", "q"]]}

Using that code block above the output is:

{'a': [['a', 'b', 'c', 'd'], ['a', 'i', 'j', 'k'], ['f', 'y', 'a', 'q'], ['a', 'b', 'c', 'd'], ['a', 'i', 'j', 'k'], ['o', 'p', 'a', 'l']], 'b': [['a', 'b', 'c', 'd'], ['e', 'f', 'b', 'h'], ['o', 'p', 'a', 'l'], ['a', 'b', 'c', 'd'], ['a', 'i', 'j', 'k']]}

I've come across this one liner, which seems to work fine (assuming that the index of the specified element is consistent) -

for k,v in my_dict.items():
    my_dict[k] = list(filter(lambda x: x[0] == k, v))

but why doesn't remove work as intended for above example?

CodePudding user response：

Actually, it does not work even for the first element. The core problem is that you mutate the list you are iterating over, in particular this:

    for val in value:
        if key not in val:
            value.remove(val)

What happens here, is that internally, the iterator is implemented using indices. What it means show following example:

In [36]: lst = [0,1,2,3]                                                                                                                                                                                                                                                       

In [37]: for item in lst: 
    ...:     lst.remove(item) 
    ...:                                                                                                                                                                                                                                                                       

In [38]: lst                                                                                                                                                                                                                                                                   
Out[38]: [1, 3]

Note that at first iteration you are processing 0-th element, and remove that one. At the end of this iteration the internal index gets increased to 1. However by removing the 0-th element, the whole list gets shifted so you actually skip the element '1' which is now at 0-th position. Similar thing happens once again skipping the '3'.

Note that this is behavior of standard python implementation (in general i think that the behavior is undefined.

What happens in your case is that in the first case all the elements that followed immediately after the removed element is not evaluated (as in the dummy case above). The only difference between 'a' and 'b' lists are that in case of 'a' list it just happens that all lists immediately following removed list contains 'a' anyway so you do not want them removed (i.e., it does not matter that you skipped them during the evaluation.

Easy fix to fix this problem is to create the copy of the list you are iterating over:

    for val in list(value):
        if key not in val:
            value.remove(val)