Home > Net >  Dividing of a table into groups of directly/indirectly connected elements in Python
Dividing of a table into groups of directly/indirectly connected elements in Python

Time:11-03

I have following table (elements from col. A and B are linked - building kind of a graph with direct & indirect connections). I am looking for a way to create separate groups (=lists) that will only contain elements that are only linked to each other (directly & indirectly), such as: {a, b, d, x} and {c, y, z}.
I figure it out how to code this in the for loop iterating through entire table (comparing if each n 1 pair contains at least one element in the previous group, then create a group). I assume this is not ideal/desirable solution in Python. Please suggest more elegant solution which might utilize Pandas.

A B
a x
b x
c y
c z
d x
x a
x x
y z

CodePudding user response:

EDIT

What you are looking for is to find all connected components in an undirected graph. You can use the code given here https://www.geeksforgeeks.org/connected-components-in-an-undirected-graph/:


# Python program to print connected
# components in an undirected graph
 
 
class Graph:
 
    # init function to declare class variables
    def __init__(self, V):
        self.V = V
        self.adj = [[] for i in range(V)]
 
    def DFSUtil(self, temp, v, visited):
 
        # Mark the current vertex as visited
        visited[v] = True
 
        # Store the vertex to list
        temp.append(v)
 
        # Repeat for all vertices adjacent
        # to this vertex v
        for i in self.adj[v]:
            if visited[i] == False:
 
                # Update the list
                temp = self.DFSUtil(temp, i, visited)
        return temp
 
    # method to add an undirected edge
    def addEdge(self, v, w):
        self.adj[v].append(w)
        self.adj[w].append(v)
 
    # Method to retrieve connected components
    # in an undirected graph
    def connectedComponents(self):
        visited = []
        cc = []
        for i in range(self.V):
            visited.append(False)
        for v in range(self.V):
            if visited[v] == False:
                temp = []
                cc.append(self.DFSUtil(temp, v, visited))
        return cc
 
 
# Driver Code
if __name__ == "__main__":
 
    # Create a graph given in the above diagram
    # 5 vertices numbered from 0 to 4
    g = Graph(5)
    g.addEdge(1, 0)
    g.addEdge(2, 1)
    g.addEdge(3, 4)
    cc = g.connectedComponents()
    print("Following are connected components")
    print(cc)

# This code is contributed by Abhishek Valsan

Answer to the question pre-edit:

With the following dataset:

d = {'A':['a','b','c','c','d'],'B':['x','x','y','z','x']}
df = pd.DataFrame(d)

you can do:

df = df.groupby('A').sum().reset_index().groupby('B').sum().reset_index()
(df.A   df.B).apply(list)

Output:

0    [a, b, d, x]
1       [c, y, z]
dtype: object

CodePudding user response:

I developed solution using for loops. It works, but I believe that Python should handle such case differently. Any suggestion how it can be improved?

The same dataset:

d = {'A':['a','b','c','c','d','x','x','y'],'B':['x','x','y','z','x','a','x','z']}
df = pd.DataFrame(d)

Function:

def corr_groups(df):
    list = [[]]
    groups = 0

    for index, row in df.iterrows():
        if not list[0]:
            list[0].append(row[0])
            list[0].append(row[1])
            groups = 1
        else:
            check = 0
            for x in range(groups):
                if (row[0] in list[x]) | (row[1] in list[x]):
                    list[x].append(row[0])
                    list[x].append(row[1])
                    check = 1
            
            if check == 0: # To prevent situation row[0/1] not in the 1st/2nd list but in the 3rd or further
                list.append([row[0], row[1]])
                groups = groups   1 

    lst = []
    for ls in list:
        lst.append(set(ls))

    return lst

Desired output:

corr_groups(df)
[{'a', 'b', 'd', 'x'}, {'c', 'y', 'z'}]
  • Related