So, let's say I have two datasets, A & B, and I'm trying to combine them on more than one column, as a left join. The corresponding SQL could would have been:
Select
A.a1
,A.a2
,A.a3
,B.b3
from A left join B
on A.a1=B.b1 and A.a2 <= B.b2
Thanks for the help in advance! :)
CodePudding user response:
You can use pd.merge to accomplish all of this
import numpy as np
df = pd.merge(left_df, right_df, how = 'left', left_on = ['Column1', 'Column2'], right_on = ['Column_1', 'Column_2'])
df['Check'] = np.where(df['Column1'] <= df['Column_1'], True, False)
df.loc[df['Check'] == True]
This also included a way to join on multiple columns if they are not the same from each dataframe.
Once the dataframes are merge you could use a np.where to find where a column is less than another column and filter it to where that is true