I am working on a graph problem, and want to drop the data where two nodes A and B would be connected twice
A to B
B to A.
Could you help me with that please ?
I have a dataframe data
Column A | Column B |
---|---|
value 1 | value 2 |
value 1 | value 3 |
value 2 | value 3 |
value 2 | value 1 |
I want to extract a dataframe of all the cases where we have these two conditions respected
Column A | Column B |
---|---|
value i | value j |
value j | value i |
in our example :
Column A | Column B |
---|---|
value 1 | value 2 |
value 2 | value 1 |
thank you very much !
I tried looping and creating lists but it's time consuming and not very aesthetic :
`l=[] indexes=[] for i in data['aretes']:
l.append([list(data[data['aretes']==i]['column A'])[0],list(data[data['aretes']==i]['column B'])[0]])
index = 0
for j in l:
index =1
h=[j[1],j[0]]
if h in l:
indexes.append(index)`
CodePudding user response:
If you want to extract the all the rows in the dataframe that are duplicated, I would first create a string representation of the set of your nodes to create a sorted id:
df["id"] = df.apply(lambda x: str(set([x['a'],x['b']])),axis=1)
Then you can used the duplicated function to drop all the rows that are not duplicated according to the id:
df[df.duplicated(["id"],keep=False)]
Results:
a b id
0 Value 1 Value 2 {'Value 1', 'Value 2'}
1 Value 2 Value 1 {'Value 1', 'Value 2'}
CodePudding user response:
Convert your column as set
then remove duplicates:
>>> df[df[['Column A', 'Column B']].agg(set, axis=1).duplicated(keep=False)]
Column A Column B
0 value 1 value 2
3 value 2 value 1
Caveats: if you have 2 instances of (value 1, value 2), they will be extracted. You can also find a solution with NetworkX
.