Home > Back-end >  verify records exist within the same group
verify records exist within the same group

Time:08-02

I have 2 dataframes, 1 with pairwise comparisons from 2 other datasets (A and B) and 1 with rows from those 2 other datasets but also a group ID indicating these rows have similar data and have been grouped together.

The pairwise comparison df looks like this:

origin_x  |  id_x    |  origin_y  |  id_y  |  ...
__________________________________________________
  A       |   13     |    B       |    7   |  ...
  A       |    2     |    B       |    5   |  ...

The grouping dataframe looks like this:

origin  |  id    |  group_id |  ...
_____________________________________
  A     |   13   |    A_13   |  ...
  B     |    7   |    A_13   |  ...

I want to check if certain rows in the pairwise comparison dataframe exist within the same group in the grouping dataframe. So if each pairwise row represents a relationship between 2 records (1 from dataset A and 1 from dataset B), I want to check that both records are present within 1 group (have the same group_id) in the grouping dataframe.

Whats the best way to go about verifying this?

CodePudding user response:

When you .merge() the pair of dataframes, be sure to specify how='left' outer join. Notice how None group IDs in the result indicate a mismatch across the pair.

CodePudding user response:

You can try:

df1 = pd.DataFrame({'origin_x':['A']*2, 'id_x':[13,2], 'origin_y':['B']*2, 'id_y':[7,5]})
df2 = pd.DataFrame({'origin':['A','B'], 'id':[13,7], 'group_ix':['A_13']*2})

df1.merge(
    df2, 
    how='left', 
    left_on=['origin_x', 'id_x'], 
    right_on=['origin', 'id']
).merge(
    df2, 
    how='left', 
    left_on=['origin_y', 'id_y'], 
    right_on=['origin', 'id']  
).dropna().iloc[:, [0,1,2,3,6]].rename(columns={'group_ix_x':'group_ix'})

Output:

  origin_x  id_x origin_y  id_y group_ix
0        A    13        B     7     A_13
  • Related