Home > database >  How to put together all corelated values from different column
How to put together all corelated values from different column

Time:09-21

I have 5 set (at max) of values:

ID1  ID2   ID3   ID4   ID5
1    2     3      5      7
1    2
1    8
3    9
4    11    15
4    17 
11   15
17    4    18

IF IDs are on the same row then they belong to a common group:

SO, I want to generate the groups:

In this example, I will have two groups:

1,2,3,5,7,8,9

and

4,11,15,17,18

I can do it for each column using

v1 =  df['ID1'].tolist()
v1= "','".join(map(str,v1)

But I cannot separate the group and cannot do it for rows. Any clue please?

CodePudding user response:

Use DataFrame.stack for DataFrame for level_0 form indices and val column first:

df = df.rename(index=str).stack().astype(int).reset_index(name='val')
print (df)
   level_0 level_1  val
0        0     ID1    1
1        0     ID2    2
2        0     ID3    3
3        0     ID4    5
4        0     ID5    7
5        1     ID1    1
6        1     ID2    2
7        2     ID1    1
8        2     ID2    8
9        3     ID1    3
10       3     ID2    9
11       4     ID1    4
12       4     ID2   11
13       4     ID3   15
14       5     ID1    4
15       5     ID2   17
16       6     ID1   11
17       6     ID2   15
18       7     ID1   17
19       7     ID2    4
20       7     ID3   18

And then create connected_components with filter out groups, here strings values:

import networkx as nx

# Create the graph from the dataframe
g = nx.Graph()

g.add_edges_from(df[['level_0','val']].itertuples(index=False))

new = [[y for y in x if isinstance(y, int)] for x in nx.connected_components(g)]
print (new)
[[2, 3, 1, 5, 7, 8, 9], [4, 11, 15, 17, 18]]
  • Related