I have a datafame that contains relationship between parent-child-origin-destination, it looks something like this.
parent_origin parent_destination child_origin child_destination
0 ABD NCL ABD ALM
1 ABD NCL ABD DHM
2 ABD YRK ABD ALM
3 ABD YRK ABD NTR
4 ABD KGX ABD SVG
I would like to group by on child_origin & child_destination to know if there are any child_origin, child_destination pairs that have 2 diffrent parent_origin & parent_destination and list out the result. I also want to print out the list of parent_origin & parent_destination that have the same child od pair.
For example, in the above dataframe i want the expected output to be like:
child_origin child_destination parent_origin parent_destination
1 ABD ALM ABD NCL
ABD YRK
What i have tried:
I can do a group by to get the values which have duplicate parents & the count of duplicates but i am not able to figure out how to diplay the actual parents values.
>>> grp = df.groupby(['child_origin','child_destination']).size().reset_index().rename(columns={0:'count'})
>>> grp[grp['count] > 1]
This gives me the count of all child_ods that have multiple parents but i want to knwo the value of parents as well.
PS: I am fairly new to pandas.
CodePudding user response:
How about:
df.loc[df.groupby(['child_origin','child_destination'])['parent_origin'].transform("count") >1]
If you want the columns in order:
df.loc[df.groupby(['child_origin','child_destination'])['parent_origin'].transform("count") >1,
['child_origin', 'child_destination', 'parent_origin', 'parent_destination']]