Home > front end >  How do I find duplicates using Python, in a list of lists of different size and create another list
How do I find duplicates using Python, in a list of lists of different size and create another list

Time:09-17

I have a list like the one below

[
[1, 2], [3], [4], [5, 6], [7], [8], [9], [10], [11, 14], [12, 13], 
[15], [16], [17], [18], [19], [20], [21, 61], [22], [23], [24], [25],
[26, 45], [27], [28], [29], [30], [31], [32], [33], [34], [35, 36],
[37], [38], [39], [40, 41], [42, 48], [43], [44], [46], [47], [49],
[50], [51], [52], [53], [54, 62], [55, 56], [57], [58, 59], [60, 61],
[63, 62], [64], [65], [66, 67], [68], [69]
]

As we can see 62 exists in [54, 62] and in [63, 62].

I want to create a new sublist that will group [54, 62] and [63, 62] into [54,62,63].

So that my new list will be as follows:

[
[1, 2], [3], [4], [5, 6], [7], [8], [9], [10], [11, 14], [12, 13],
[15], [16], [17], [18], [19], [20], [21, 61], [22], [23], [24], [25],
[26, 45], [27], [28], [29], [30], [31], [32], [33], [34], [35, 36],
[37], [38], [39], [40, 41], [42, 48], [43], [44], [46], [47], [49],
[50], [51], [52], [53], [54, 62, 63], [55, 56], [57], [58, 59],
[60, 61], [64], [65], [66, 67], [68], [69]
]

CodePudding user response:

Maybe a bit overkill, but always useful to try to see the problem from multiple sides. If we consider each number a node, and a each sublist an edge, the problem reduces to finding the connected components of the graph.

This can be done easily with networkx.

import networkx
from itertools import chain

lst = [
[1, 2], ..., [68], [69]
]

g = networkx.Graph()
g.add_nodes_from(chain.from_iterable(lst))
g.add_edges_from(i for i in lst if len(i) == 2)

result = [list(i) for i in networkx.connected_components(g)]

yields:

[[1, 2],
 [3],
 [4],
 [5, 6],
 [7], ...
[66, 67],
 [68],
 [69]]

CodePudding user response:

i finally wrote the following code that i think that is ok for me.

identical_groups=[[1, 2], [5, 6], [11, 14], [12, 13], [21, 61], [26, 45], [35, 36], [40, 41], [42, 48], [54, 62], [55, 56], [58, 59], [60, 61], [63, 62], [66, 67]]
        
groupedPoints_Dict={}
index=0
exist_key=[]
for currentPointGroup in identical_groups:
    for point in currentPointGroup:
        if len(exist_key)==0:
            exist_key = [key for key, value in groupedPoints_Dict.items() if point in value]
    if len(exist_key)>0 :
        oldgroupedPoint_List=groupedPoints_Dict[int(''.join(str(e) for e in exist_key))]
        #print "oldgroupedPoint_List", oldgroupedPoint_List
        newgroupedPoint_List=list(set(oldgroupedPoint_List   currentPointGroup))
        #print "newgroupedPoint_List", newgroupedPoint_List
        #print "exist_key", exist_key
        
        dict_key=int(''.join(str(e) for e in exist_key))
        groupedPoints_Dict[dict_key]=newgroupedPoint_List
    else:   
        groupedPoints_Dict[index 1]=currentPointGroup
        index  = 1  
    exist_key=[]
  • Related