Home > Net >  Efficient way to find row in df2 based on condition from value in df1
Efficient way to find row in df2 based on condition from value in df1

Time:04-10

I have two dataframes. df1 has ~31,000 rows, while df2 has ~117,000 rows. I want to add a column to df1 based on the following conditions.

(df1.id == df2.id) and (df2.min_value < df1.value <= df2.max_value)

I know that df2 will return either 0 or 1 rows satisfying the condition for each value of id in df1. For each row in df1, I want to add a column from df2 when the above condition is satisfied.

My current code is as follows. It is a line by line approach.

new_df1 = pd.DataFrame(columns = df1.columns.tolist() [new_col])
for i, row in df1.iterrows():
    val = row['value']
    id = row['id']
    dummy = df2[(df2.id == id) & (df2.max_value >= val) & (df2.min_value < val)]
    if dummy.shape[0] == 0:
        new_col = np.nan
    else:
        new_col = dummy.new_column.values[0]
    l = len(new_df1)
    new_df1.loc[l] = row.tolist() [new_col]

This is a time costly approach. Is there a way to more efficiently do this problem?

CodePudding user response:

You can merge df1 and df2 based on the id column:

merged_df = df1.merge(df2, on='id', how='left')

Now, any row in DF1 for which the id matches an id of a row in DF2 will have all the DF2 columns placed alongside it. Then, you can simply filter the merged dataframe for your given condition:

merged_df.query('max_value > value and min_value < val')
  • Related