Home > Software engineering >  how can i extract all the rows of the dataframe where there is a symetrical [column A = i, column B
how can i extract all the rows of the dataframe where there is a symetrical [column A = i, column B

Time:01-25

I am working on a graph problem, and want to drop the data where two nodes A and B would be connected twice

A to B

B to A.

Could you help me with that please ?

I have a dataframe data

Column A Column B
value 1 value 2
value 1 value 3
value 2 value 3
value 2 value 1

I want to extract a dataframe of all the cases where we have these two conditions respected

Column A Column B
value i value j
value j value i

in our example :

Column A Column B
value 1 value 2
value 2 value 1

thank you very much !

I tried looping and creating lists but it's time consuming and not very aesthetic :

`l=[] indexes=[] for i in data['aretes']:

l.append([list(data[data['aretes']==i]['column A'])[0],list(data[data['aretes']==i]['column B'])[0]])

index = 0

for j in l:

index =1

h=[j[1],j[0]]

if h in l:

    indexes.append(index)`

CodePudding user response:

If you want to extract the all the rows in the dataframe that are duplicated, I would first create a string representation of the set of your nodes to create a sorted id:

df["id"] = df.apply(lambda x: str(set([x['a'],x['b']])),axis=1)

Then you can used the duplicated function to drop all the rows that are not duplicated according to the id:

df[df.duplicated(["id"],keep=False)]

Results:

    a           b           id
0   Value 1     Value 2     {'Value 1', 'Value 2'}
1   Value 2     Value 1     {'Value 1', 'Value 2'}

CodePudding user response:

Convert your column as set then remove duplicates:

>>> df[df[['Column A', 'Column B']].agg(set, axis=1).duplicated(keep=False)]
  Column A Column B
0  value 1  value 2
3  value 2  value 1

Caveats: if you have 2 instances of (value 1, value 2), they will be extracted. You can also find a solution with NetworkX.

  • Related