I have following table (elements from col. A and B are linked - building kind of a graph with direct & indirect connections). I am looking for a way to create separate groups (=lists) that will only contain elements that are only linked to each other (directly & indirectly), such as:
{a, b, d, x}
and {c, y, z}
.
I figure it out how to code this in the for loop
iterating through entire table (comparing if each n 1
pair contains at least one element in the previous group, then create a group). I assume this is not ideal/desirable solution in Python. Please suggest more elegant solution which might utilize Pandas.
A | B |
---|---|
a | x |
b | x |
c | y |
c | z |
d | x |
x | a |
x | x |
y | z |
CodePudding user response:
EDIT
What you are looking for is to find all connected components in an undirected graph. You can use the code given here https://www.geeksforgeeks.org/connected-components-in-an-undirected-graph/:
# Python program to print connected
# components in an undirected graph
class Graph:
# init function to declare class variables
def __init__(self, V):
self.V = V
self.adj = [[] for i in range(V)]
def DFSUtil(self, temp, v, visited):
# Mark the current vertex as visited
visited[v] = True
# Store the vertex to list
temp.append(v)
# Repeat for all vertices adjacent
# to this vertex v
for i in self.adj[v]:
if visited[i] == False:
# Update the list
temp = self.DFSUtil(temp, i, visited)
return temp
# method to add an undirected edge
def addEdge(self, v, w):
self.adj[v].append(w)
self.adj[w].append(v)
# Method to retrieve connected components
# in an undirected graph
def connectedComponents(self):
visited = []
cc = []
for i in range(self.V):
visited.append(False)
for v in range(self.V):
if visited[v] == False:
temp = []
cc.append(self.DFSUtil(temp, v, visited))
return cc
# Driver Code
if __name__ == "__main__":
# Create a graph given in the above diagram
# 5 vertices numbered from 0 to 4
g = Graph(5)
g.addEdge(1, 0)
g.addEdge(2, 1)
g.addEdge(3, 4)
cc = g.connectedComponents()
print("Following are connected components")
print(cc)
# This code is contributed by Abhishek Valsan
Answer to the question pre-edit:
With the following dataset:
d = {'A':['a','b','c','c','d'],'B':['x','x','y','z','x']}
df = pd.DataFrame(d)
you can do:
df = df.groupby('A').sum().reset_index().groupby('B').sum().reset_index()
(df.A df.B).apply(list)
Output:
0 [a, b, d, x]
1 [c, y, z]
dtype: object
CodePudding user response:
I developed solution using for loops
. It works, but I believe that Python should handle such case differently. Any suggestion how it can be improved?
The same dataset:
d = {'A':['a','b','c','c','d','x','x','y'],'B':['x','x','y','z','x','a','x','z']}
df = pd.DataFrame(d)
Function:
def corr_groups(df):
list = [[]]
groups = 0
for index, row in df.iterrows():
if not list[0]:
list[0].append(row[0])
list[0].append(row[1])
groups = 1
else:
check = 0
for x in range(groups):
if (row[0] in list[x]) | (row[1] in list[x]):
list[x].append(row[0])
list[x].append(row[1])
check = 1
if check == 0: # To prevent situation row[0/1] not in the 1st/2nd list but in the 3rd or further
list.append([row[0], row[1]])
groups = groups 1
lst = []
for ls in list:
lst.append(set(ls))
return lst
Desired output:
corr_groups(df)
[{'a', 'b', 'd', 'x'}, {'c', 'y', 'z'}]