From a databaset, create an edgelist and node's attributes list-CodePudding

I have a dataset wit the following columns and observations:

   Source   Target  Label_Source    Label_Target
    E   N   0.0 0.0
    A   B   1.0 1.0
    A   C   1.0 0.0
    A   D   1.0 0.0
    A   N   1.0 0.0
    S   G   0.0 0.0
    S   L   0.0 1.0
    S   C   0.0 0.0

Who built the dataset did not split into edgelist and node attributes so now I am interested in creating these two separate datasets. My idea is to select unique nodes in the network and create a map between the nodes and their corresponding label values, be aware that Label_Source is assigned to the source node and Label_Target is assigned to the target node. There is no overlapping of the two in the network (at least, there is should not be). My expected output would be

edgelist (just by dropping the Labels columns):

Source Target
E N
A B
A C
A D
A N
S G
S L
S C

nodelist with attributes:

 Node    Label
 E          0
 N          0
 A          1
 B          1
 C          0
 D          0
 S          0
 G          0
 L          1

Can you please tell me how to get the nodelist creating this mapping? I guess an option would be to select distinct elements from both Source and Target, then for each of them look at their labels in Label_source or target columns.

CodePudding user response：

Let us try split then groupby with max

out1 = df.filter(like='Source')
out2 = df.filter(like='Target')
out1.columns = ['Node','Label']
out2.columns = ['Node','Label']
out = pd.concat([out1,out2]).groupby('Node').max().reset_index()

  Node  Label
0    A    1.0
1    B    1.0
2    C    0.0
3    D    0.0
4    E    0.0
5    G    0.0
6    L    1.0
7    N    0.0
8    S    0.0

CodePudding user response：

Try:

edgelist = df[['Source', 'Target']]
nodelist = pd.concat([pd.DataFrame(df.filter(like='Source').to_numpy()),
                      pd.DataFrame(df.filter(like='Target').to_numpy())]) \
             .rename(columns={0: 'Node', 1: 'Label'}).fillna(0) \
             .astype({'Label': int}).drop_duplicates().reset_index(drop=True)

Output:

>>> edgelist
  Source Target
0      E      N
1      A      B
2      A      C
3      A      D
4      A      N
5      S      G
6      S      L
7      S      C

>>> nodelist
  Node  Label
0    E      0
1    A      1
2    S      0
3    N      0
4    B      1
5    C      0
6    D      0
7    G      0
8    L      1