Pandas DataFrame - row comparision and isolation problem-CodePudding

I have those DataFrame where I have fathers that are their own grandchild. I want to isolate the corresponding rows to treat them separately.

df = pd.DataFrame({
    'father' : ['a', 'b', 'e', 'f', 'j', 'k'],
    'son' : ['b', 'a', 'f', 'g', 'k', 'j']
})
df
df2 = pd.DataFrame({
    'father' : [1, 2, 4, 11, 10, 5],
    'son' : [2, 1, 5, 10, 11, 6]
})
df2

We can see that in the first one we want to extract the rows where we find the values ab, ba and jk, kj, because 'a' precedes 'b' and 'b' precedes 'a', idem for 'j' and 'k'.

Same thing in the second one with our integrates 1, 2 and 10, 11.

I tried (in majority but not only) to use things like

df[df[['FH', 'REM']].isin(df[['REM', 'FH']])]

df[df[['FH', 'REM']]==df[['REM', 'FH']]]

Ineffectively.

My main problem is that I don't understand how to compare rows between them to do this.

CodePudding user response：

This will show all the rows that are their own grandchild in the same dataframe:

import pandas as pd

df = pd.DataFrame({
    'father' : ['a', 'b', 'e', 'f', 'j', 'k'],
    'son' : ['b', 'a', 'f', 'g', 'k', 'j']
})

print(df.loc[(df['father'].isin(df['son'])) & (df['son'].isin(df['father']))])

Result:

  father son
0      a   b
1      b   a
4      j   k
5      k   j

CodePudding user response：

You can use frozenset to group your rows:

df['group'] = df.apply(frozenset, axis=1)
print(df)

# Output
  father son   group
0      a   b  (a, b)
1      b   a  (a, b)
2      e   f  (e, f)
3      f   g  (g, f)
4      j   k  (j, k)
5      k   j  (j, k)

After, you can use a boolean mask:

m = df.groupby('group')['group'].transform('count') == 2

>>> df[m]
  father son   group
0      a   b  (a, b)
1      b   a  (a, b)
4      j   k  (j, k)
5      k   j  (j, k)

>>> df[~m]
  father son   group
2      e   f  (e, f)
3      f   g  (g, f)