I have a dataset wit the following columns and observations:
Source Target Label_Source Label_Target
E N 0.0 0.0
A B 1.0 1.0
A C 1.0 0.0
A D 1.0 0.0
A N 1.0 0.0
S G 0.0 0.0
S L 0.0 1.0
S C 0.0 0.0
Who built the dataset did not split into edgelist and node attributes so now I am interested in creating these two separate datasets. My idea is to select unique nodes in the network and create a map between the nodes and their corresponding label values, be aware that Label_Source is assigned to the source node and Label_Target is assigned to the target node. There is no overlapping of the two in the network (at least, there is should not be). My expected output would be
edgelist (just by dropping the Labels columns):
Source Target
E N
A B
A C
A D
A N
S G
S L
S Cnodelist with attributes:
Node Label E 0 N 0 A 1 B 1 C 0 D 0 S 0 G 0 L 1
Can you please tell me how to get the nodelist creating this mapping? I guess an option would be to select distinct elements from both Source and Target, then for each of them look at their labels in Label_source or target columns.
CodePudding user response:
Let us try split then groupby
with max
out1 = df.filter(like='Source')
out2 = df.filter(like='Target')
out1.columns = ['Node','Label']
out2.columns = ['Node','Label']
out = pd.concat([out1,out2]).groupby('Node').max().reset_index()
Node Label
0 A 1.0
1 B 1.0
2 C 0.0
3 D 0.0
4 E 0.0
5 G 0.0
6 L 1.0
7 N 0.0
8 S 0.0
CodePudding user response:
Try:
edgelist = df[['Source', 'Target']]
nodelist = pd.concat([pd.DataFrame(df.filter(like='Source').to_numpy()),
pd.DataFrame(df.filter(like='Target').to_numpy())]) \
.rename(columns={0: 'Node', 1: 'Label'}).fillna(0) \
.astype({'Label': int}).drop_duplicates().reset_index(drop=True)
Output:
>>> edgelist
Source Target
0 E N
1 A B
2 A C
3 A D
4 A N
5 S G
6 S L
7 S C
>>> nodelist
Node Label
0 E 0
1 A 1
2 S 0
3 N 0
4 B 1
5 C 0
6 D 0
7 G 0
8 L 1