Efficient connection grouping algorithm or implementation in python-CodePudding

I'm looking for an efficient connection grouping (I'm not sure this is proper name..) algorithm or implementation by python.

For example, I have this nested list:

connection_data = [
   ...:     ["A", "B", "C"],
   ...:     ["B", "D"],
   ...:     ["A", "C"],
   ...:     ["E", "F"],
   ...:     ["C", "D"],
   ...:     ]

This data means each list in the nested list shows connections. For example, the first connection ["A", "B", "C"] means A and B and C are having connection each other. The nested list has multiple connections information.

I would like to calculate connection groupings from a nested list. For example, when I have the upper connection_data, I would like to get

grouped_connection = [
   ...:     ["A", "B", "C", "D"],
   ...:     ["E", "F"],
   ...:     ]

Because, A, B, C, D have connections in these connection data in the connection_data: ["A", "B", "C"], ["B", "D"], ["A", "C"], ["C", "D"], E and F have connection by ["E", "F"].

To summarize my questions:

What is this type of problem called in general?
I think I can implement a many for-loop based solver. But is there any efficient algorithm or implementation in python for this type of problem?

CodePudding user response：

You can construct a graph where the nodes are lettered A, B, C ..., and place an undirected, unweighted edge between two nodes if the initial groupings dictate that they should be in the same group. Then, the final groups are the connected components of the constructed graph. (This is the closest thing to what your problem is generally called.)

This can be done using a graph traversal algorithm, such as BFS or DFS, but if you don't want to code this up by hand, networkx can take care of everything once you've made the graph. You need to do a little bit of preprocessing on the groupings since some of them are of size greater than two, but otherwise networkx is well-suited for this problem:

import networkx as nx

groups = [
    ["A", "B", "C"],
    ["B", "D"],
    ["A", "C"],
    ["E", "F"],
    ["C", "D"],
]

# Preprocess initial groupings into edges for graph construction.
edges = set()

for group in groups:
    for src_index in range(len(group)):
        for dest_index in range(src_index   1, len(group)):
            edges.add(tuple(sorted((group[src_index], group[dest_index]))))

G = nx.Graph()
G.add_edges_from(list(edges))
print(list(list(group) for group in nx.algorithms.components.connected_components(G)))

CodePudding user response：

Networkx provides an implementation of a union find datastructure [1] [2] that solves this problem efficiently:

from networkx.utils.union_find import UnionFind

groups = [
    ["A", "B", "C"],
    ["B", "D"],
    ["A", "C"],
    ["E", "F"],
    ["C", "D"],
]

ds = UnionFind()
for gp in groups:
  ds.union(*gp)
for s in ds.to_sets():
  print(s)

# {'B', 'C', 'D', 'A'}
# {'E', 'F'}