Home > Mobile >  How to match rows in DataFrame on another DataFrame with multiply condition
How to match rows in DataFrame on another DataFrame with multiply condition

Time:11-03

I have two dataframes - df1 and df2 as following:

df_1 = pd.DataFrame( {'num': [1,2,3], 'time': [100,200,300]})
df_2 = pd.DataFrame( {'num': [1,2,3], 'time': [101,104,200]})

Match = is when 'num' in df1 not equals to 'num' in df2 and the time in df1 is in df2 with offset of 10. The results need to be in df_3:

If we find few matches, we take the only the first.

My code is works but not efficient

My code:

df_3 = pd.DataFrame(columns = ['num_df1', 'num_df2', 'time_df1', 'time_df2'])

def find_match(row_df_1, df_2):
    time = row_df_1['time']
    match_in_df2 = df_2[(df_2['num'] != row_df_1['num']) & (df_2['time'].between(time-10, time 10))]
    
    if len(match_in_df2) > 0:
        match_in_df2.reset_index(inplace=True, drop=True)
        return pd.Series([row_df_1['num'], match_in_df2.loc[0,'num'],row_df_1['time'], match_in_df2.loc[0,'time']], index=df_3.columns)
        

for index, row_df_1 in df_1.iterrows():
    df_3 = df_3.append(find_match(row_df_1, df_2), ignore_index=True)


results:

num_df1 num_df2 time_df1    time_df2
0   1   2        100           104
1   2   3        200           200

CodePudding user response:

Use merge with how='cross' before use boolean masks to select right rows:

out = pd.merge(df_1, df_2, how='cross', suffixes=('_df1', '_df2'))
m1 = out['num_df1'] != out['num_df2']
m2 = abs(out['time_df2'] - out['time_df1']) <= 10
out = out[m1 & m2]

Output:

>>> out
   num_df1  time_df1  num_df2  time_df2
1        1       100        2       104
5        2       200        3       200
  • Related