I have 2 dataframes, 1 with pairwise comparisons from 2 other datasets (A and B) and 1 with rows from those 2 other datasets but also a group ID indicating these rows have similar data and have been grouped together.
The pairwise comparison df looks like this:
origin_x | id_x | origin_y | id_y | ...
__________________________________________________
A | 13 | B | 7 | ...
A | 2 | B | 5 | ...
The grouping dataframe looks like this:
origin | id | group_id | ...
_____________________________________
A | 13 | A_13 | ...
B | 7 | A_13 | ...
I want to check if certain rows in the pairwise comparison dataframe exist within the same group in the grouping dataframe. So if each pairwise row represents a relationship between 2 records (1 from dataset A and 1 from dataset B), I want to check that both records are present within 1 group (have the same group_id
) in the grouping dataframe.
Whats the best way to go about verifying this?
CodePudding user response:
When you .merge() the pair of dataframes,
be sure to specify how='left'
outer join.
Notice how None
group IDs in the result
indicate a mismatch across the pair.
CodePudding user response:
You can try:
df1 = pd.DataFrame({'origin_x':['A']*2, 'id_x':[13,2], 'origin_y':['B']*2, 'id_y':[7,5]})
df2 = pd.DataFrame({'origin':['A','B'], 'id':[13,7], 'group_ix':['A_13']*2})
df1.merge(
df2,
how='left',
left_on=['origin_x', 'id_x'],
right_on=['origin', 'id']
).merge(
df2,
how='left',
left_on=['origin_y', 'id_y'],
right_on=['origin', 'id']
).dropna().iloc[:, [0,1,2,3,6]].rename(columns={'group_ix_x':'group_ix'})
Output:
origin_x id_x origin_y id_y group_ix
0 A 13 B 7 A_13