I have 5 set (at max) of values:
ID1 ID2 ID3 ID4 ID5
1 2 3 5 7
1 2
1 8
3 9
4 11 15
4 17
11 15
17 4 18
IF IDs are on the same row then they belong to a common group:
SO, I want to generate the groups:
In this example, I will have two groups:
1,2,3,5,7,8,9
and
4,11,15,17,18
I can do it for each column using
v1 = df['ID1'].tolist()
v1= "','".join(map(str,v1)
But I cannot separate the group and cannot do it for rows. Any clue please?
CodePudding user response:
Use DataFrame.stack
for DataFrame
for level_0
form indices and val
column first:
df = df.rename(index=str).stack().astype(int).reset_index(name='val')
print (df)
level_0 level_1 val
0 0 ID1 1
1 0 ID2 2
2 0 ID3 3
3 0 ID4 5
4 0 ID5 7
5 1 ID1 1
6 1 ID2 2
7 2 ID1 1
8 2 ID2 8
9 3 ID1 3
10 3 ID2 9
11 4 ID1 4
12 4 ID2 11
13 4 ID3 15
14 5 ID1 4
15 5 ID2 17
16 6 ID1 11
17 6 ID2 15
18 7 ID1 17
19 7 ID2 4
20 7 ID3 18
And then create connected_components with filter out groups, here strings values:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
g.add_edges_from(df[['level_0','val']].itertuples(index=False))
new = [[y for y in x if isinstance(y, int)] for x in nx.connected_components(g)]
print (new)
[[2, 3, 1, 5, 7, 8, 9], [4, 11, 15, 17, 18]]