The dataset, in the form
Source Target Source_Class Target_Class
1 2 1 0
1 3 1 0
2 1 0 1
4 2 0 0
5 4 0 0
5 1 0 1
3 1 0 1
is used to build a network, where Source_Class is a Source's attribute and Target_Class is a Target's attribute. I need to find the edges that link two nodes having different classes, for example 1 (which has class 1) and 2 (which has class 0); 1 and 3, and so on, i.e. a list of edges that are 'connectors' within the network, as they link two nodes having different classes.
Written as above, the problem seems pretty easy to solve, but I have a question on how to consider only once the Source/Target nodes. For instance, I could use a logical sum and select only the rows that have 0(1) in Source_Class(Target_Class) and 1(0) in Target_Class (Source_Class). But, I would have duplicates, considering the network as undirected.
Source Target Source_Class Target_Class
1 2 1 0
1 3 1 0
2 1 0 1
5 1 0 1
3 1 0 1
My expected output would be:
Source Target Different
1 2 1
1 3 1
5 1 1
Do you know how to filter duplicates out?
CodePudding user response:
Use, np.sort
to order the Source/Target
pair, then you can groupby on that:
a = np.sort(df[['Source', 'Target']], axis=1)
(df.groupby([a[:,0], a[:,1]]).head(1)
.reset_index(drop=True)
.query('Source_Class != Target_Class')
)
Output:
Source Target Source_Class Target_Class
0 1 2 1 0
1 1 3 1 0
4 5 1 0 1