construct a chain using two columns in python-CodePudding

I am new to python and struggling with data formatting with the below data frame :

Col1    Col2
Type1   Type2
Type3   Type4
Type8   Type13
Type3   Type15
Type2   Type6
Type4   Type9
Type6   Type11
Type9   Type18
Type13  Type20

I want to identify the chain like format using col1 and col2. For example Type1-->Type2-->Type6-->Type11 form a chain.So the final result will look as below :

Col1    Col2    Chain
Type1   Type2   Chain1
Type3   Type4   Chain2
Type8   Type13  Chain3
Type3   Type15  
Type2   Type6   Chain1
Type4   Type9   Chain2
Type6   Type11  Chain1
Type9   Type18  Chain2
Type13  Type20  Chain3

CodePudding user response：

You might want to do something like this (you need to install networkx). Note that df is your Dataframe containing all your data:

import networkx as nx

edges = df.drop_duplicates(['Col1'])
G = nx.Graph()
G.add_edges_from(edges.itertuples(index=False, name=None))
ccs = list(nx.connected_components(G))
df['Chain'] = df.apply(lambda row: next((f'Chain{i}' for i, cc in enumerate(ccs) if row[0] in cc and row[1] in cc), ''), axis=1)

Output:

     Col1    Col2   Chain
0   Type1   Type2  Chain0
1   Type3   Type4  Chain1
2   Type8  Type13  Chain2
3   Type3  Type15        
4   Type2   Type6  Chain0
5   Type4   Type9  Chain1
6   Type6  Type11  Chain0
7   Type9  Type18  Chain1
8  Type13  Type20  Chain2