Create new key based on relationship between two columns-CodePudding

I'm trying to add a key for all related instances between two columns, then create a GroupID

The logic will be:

Check all instances of ID2 linked to ID1
CHeck all instances of ID1 linked to ID2 found in (1)
Repeat until all relationships found

CodePudding user response：

Let us try with networkx

import networkx as nx
G=nx.from_pandas_edgelist(df, 'ID1', 'ID2')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
df['new'] = df['ID1'].map(d)
df
Out[302]: 
  ID1  ID2  new
0   A    1    0
1   A    2    0
2   B    1    0
3   B    3    0
4   C    4    1
5   C    5    1
6   D    2    0