I have those DataFrame where I have fathers that are their own grandchild. I want to isolate the corresponding rows to treat them separately.
df = pd.DataFrame({
'father' : ['a', 'b', 'e', 'f', 'j', 'k'],
'son' : ['b', 'a', 'f', 'g', 'k', 'j']
})
df
df2 = pd.DataFrame({
'father' : [1, 2, 4, 11, 10, 5],
'son' : [2, 1, 5, 10, 11, 6]
})
df2
We can see that in the first one we want to extract the rows where we find the values ab, ba and jk, kj, because 'a'
precedes 'b'
and 'b'
precedes 'a'
, idem for 'j'
and 'k'
.
Same thing in the second one with our integrates 1
, 2
and 10
, 11
.
I tried (in majority but not only) to use things like
df[df[['FH', 'REM']].isin(df[['REM', 'FH']])]
or
df[df[['FH', 'REM']]==df[['REM', 'FH']]]
Ineffectively.
My main problem is that I don't understand how to compare rows between them to do this.
CodePudding user response:
This will show all the rows that are their own grandchild in the same dataframe:
import pandas as pd
df = pd.DataFrame({
'father' : ['a', 'b', 'e', 'f', 'j', 'k'],
'son' : ['b', 'a', 'f', 'g', 'k', 'j']
})
print(df.loc[(df['father'].isin(df['son'])) & (df['son'].isin(df['father']))])
Result:
father son
0 a b
1 b a
4 j k
5 k j
CodePudding user response:
You can use frozenset
to group your rows:
df['group'] = df.apply(frozenset, axis=1)
print(df)
# Output
father son group
0 a b (a, b)
1 b a (a, b)
2 e f (e, f)
3 f g (g, f)
4 j k (j, k)
5 k j (j, k)
After, you can use a boolean mask:
m = df.groupby('group')['group'].transform('count') == 2
>>> df[m]
father son group
0 a b (a, b)
1 b a (a, b)
4 j k (j, k)
5 k j (j, k)
>>> df[~m]
father son group
2 e f (e, f)
3 f g (g, f)