I have a dataframe such as :
tab1
Group1 Group2
G1 G2
G4 G3
G5 G3
tab2
Names Groups
Canis_lupus G1
Cattus_cattus G1
Mus_musculus G1
Danio_rerio G2
Betta_splendens G2
Griseus_gris G3
Buffallo_kol G3
Homo_sapiens G4
Macaque_ser G4
Wistiti_del G5
Apis_mellifera G6
And I would like to add a new Connected_groups
column to the tab2 where I put all connect groups within the tab1
I should then get :
Names Groups Connected_groups
Canis_lupus G1 G1-G2
Cattus_cattus G1 G1-G2
Mus_musculus G1 G1-G2
Danio_rerio G2 G1-G2
Betta_splendens G2 G1-G2
Griseus_gris G3 G3-G4-G5
Buffallo_kol G3 G3-G4-G5
Homo_sapiens G4 G3-G4-G5
Macaque_ser G4 G3-G4-G5
Wistiti_del G5 G3-G4-G5
Apis_mellifera G6 G6
Here are the dic format of the df if it can helps ;
tab1 = pd.DataFrame.from_dict({'Group1': {0: 'G1', 1: 'G4', 2: 'G5'}, 'Group2': {0: 'G2', 1: 'G3', 2: 'G3'}})
tab2=pd.DataFrame.from_dict({'Names': {0: 'Canis_lupus', 1: 'Cattus_cattus', 2: 'Mus_musculus', 3: 'Danio_rerio', 4: 'Betta_splendens', 5: 'Griseus_gris', 6: 'Buffallo_kol', 7: 'Homo_sapiens', 8: 'Macaque_ser', 9: 'Wistiti_del', 10: 'Apis_mellifera'}, 'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G2', 4: 'G2', 5: 'G3', 6: 'G3', 7: 'G4', 8: 'G4', 9: 'G5', 10: 'G6'}})
CodePudding user response:
Let us try nextworkx
to find connected groups in tab1
, then create a mapping dictionary of connected groups and use it with replace
to substitute the values in tab2
import networkx as nx
G = nx.from_pandas_edgelist(tab1, 'Group1', 'Group2')
d = {k: '-'.join(c) for c in nx.connected_components(G) for k in c}
tab2['conn-grps'] = tab2['Groups'].replace(d)
Names Groups conn-grps
0 Canis_lupus G1 G2-G1
1 Cattus_cattus G1 G2-G1
2 Mus_musculus G1 G2-G1
3 Danio_rerio G2 G2-G1
4 Betta_splendens G2 G2-G1
5 Griseus_gris G3 G3-G5-G4
6 Buffallo_kol G3 G3-G5-G4
7 Homo_sapiens G4 G3-G5-G4
8 Macaque_ser G4 G3-G5-G4
9 Wistiti_del G5 G3-G5-G4
10 Apis_mellifera G6 G6