How to find common positions in list of lists where the elements are always duplicates and then remo-CodePudding

I have a list of lists, where the lists are always ordered in the same way, and within each list several of the elements are duplicates. I would therefore like to remove duplicates from the list, but it's important that I retain the structure of each list i.e. if elements indices 0, 1 and 2 are all duplicates for a given list, two of these would be removed from the list, but then the same positions elements would also have to be removed from all the other lists too to retain the ordered structure.

Crucially however, it may not be the case that elements with indices 0, 1 and 2 are duplicates in the other lists, and therefore I would only want to do this if I was sure that across the lists, elements indexed by 0, 1 and 2 were always duplicated.

As an example, say I had this list of lists

L = [ [1,1,1,3,3,2,4,6,6], 
[5,5,5,4,5,6,5,7,7], 
[9,9,9,2,2,7,8,10,10] ]

After applying my method I would like to be left with

L_new = [ [1,3,3,2,4,6], 
[5,4,5,6,5,7], 
[9,2,2,7,8,10] ]

where you see that elements index 1 and 2 and element 8 have all been constantly removed because they are consistently duplicated across all lists, whereas elements index 3 and 4 have not because they are not always duplicated.

My thinking so far (though I believe this is probably not the best approach and why I asked for help)

def check_duplicates_in_same_position(arr_list):
    check_list = []
    for arr in arr_list:
        duplicate_positions_list = []
        positions = {}
        for i in range(len(arr)):
            item = arr[i]
            if item in positions:
                positions[item].append(i)
            else:
                positions[item] = [i]
        duplicate_positions = {k: v for k, v in positions.items() if len(v) > 1}
        for _, item in duplicate_positions.items():
            duplicate_positions_list.append(item)
        check_list.append(duplicate_positions_list)
    
    return check_list

This returns a list of lists of lists, where each element is a list that contains a bunch of lists whose elements are the indices of the duplicates for that list as so

[[[0, 1, 2], [3, 4], [7, 8]],
 [[0, 1, 2, 4, 6], [7, 8]],
 [[0, 1, 2], [3, 4], [7, 8]]]

I then thought to somehow compare these lists and for example remove elements index 1 and 2 and index 8, because these are common matches for each.

CodePudding user response：

Assuming all sub-lists will have the same length, this should work:

l = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]

[list(x) for x in zip(*dict.fromkeys(zip(*l)))]

# Output: [[1, 3, 3, 2, 4, 6], [5, 4, 5, 6, 5, 7], [9, 2, 2, 7, 8, 10]]

Explanation:

zip(*l) - This will create a new 1-dimension array. The nth element will be a tuple with all the nth elements in the original sublists:

[(1, 5, 9),
 (1, 5, 9),
 (1, 5, 9),
 (3, 4, 2),
 (3, 5, 2),
 (2, 6, 7),
 (4, 5, 8),
 (6, 7, 10),
 (6, 7, 10)]

From the previous list, we only want to keep those that are not repeated. There are various ways of achieving this. If you search how to remove duplicates while mantaining order, this answer will pop up. It uses dict.fromkeys(<list>). Since python dict keys must be unique, this removes duplicates and generates the following output:

{(1, 5, 9): None,
 (3, 4, 2): None,
 (3, 5, 2): None,
 (2, 6, 7): None,
 (4, 5, 8): None,
 (6, 7, 10): None}

We now want to unzip those keys to the original 2-dimensional array. For that, we can use zip again:

zip(*dict.fromkeys(zip(*l)))

Since zip returns tuples, we have to finally convert the tuples to list using a list comprehension:

[list(x) for x in zip(*dict.fromkeys(zip(*l)))]

CodePudding user response：

I would go with something like this. It is not too fast, but dependent on the size of your lists, it could be sufficient.

L = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]

azip = zip(*L)
temp_L = []
for zz in azip:
    if not zz in temp_L:
        temp_L.append(zz)
new_L = [list(zip(*temp_L))[zz] for zz in range(len(L))]

first, we zip the three (or more) lists within L. Then, we iterate over each element, check if it already exists. If not, we add it to our temporary list temp_L. And in the end we restructure temp_L to be of the original format. It returns

new_L
>> [(1, 3, 3, 2, 4, 6), (5, 4, 5, 6, 5, 7), (9, 2, 2, 7, 8, 10)]