Home > database >  python3 join lists that have same value in list of lists
python3 join lists that have same value in list of lists

Time:04-22

I have similar question to one that has been asked several years ago, the link is down here. the thing is that all answers are in python 2 and does work for me. my lists are huge so time is important. if anyone can solve that for python3, that will really help.

Consider this list of lists:

l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]

I want to combine all the lists that have at least one number in common, that will be done iteratively until it finished and no dubletters will go with. The result will be:

combine(l) = [ [1,2,3,4], [5,6,7,8,9] ]

is there any neat way to do this, perhaps with itertools?

graph

networkx

Using networkx you can do:

import networkx as nx
from itertools import chain, pairwise
# python < 3.10 use this recipe for pairwise instead
# from itertools import tee
# def pairwise(iterable):
#     a, b = tee(iterable)
#     next(b, None)
#     return zip(a, b)

G = nx.Graph()
G = nx.from_edgelist(chain.from_iterable(pairwise(e) for e in l))
G.add_nodes_from(set.union(*map(set, l))) # adding single items

list(nx.connected_components(G))

output:

[{1, 2, 3, 4}, {5, 6, 7, 8, 9}]

python

Now, you can use pure python to perform the same thing, finding the connected components and merging them.

An example code is nicely described in this post (archive.org link for long term).

In summary, the first step is building the list of neighbors, then a recursive function is used to join the neighbors of neighbors keeping track of the already seen ones.

from collections import defaultdict 
  
#merge function to  merge all sublist having common elements. 
def merge_common(lists): 
    neigh = defaultdict(set) 
    visited = set() 
    for each in lists: 
        for item in each: 
            neigh[item].update(each) 
    def comp(node, neigh = neigh, visited = visited, vis = visited.add): 
        nodes = set([node]) 
        next_node = nodes.pop 
        while nodes: 
            node = next_node() 
            vis(node) 
            nodes |= neigh[node] - visited 
            yield node 
    for node in neigh: 
        if node not in visited: 
            yield sorted(comp(node))

example:

merge_common(l)
# [[1, 2, 3, 4], [5, 6, 7, 8, 9]]
  • Related