I have two identical frames and instead trying to inner join them, I get a cartesian product. Do you know what can cause that? The type of my columns is Int64.
df1 =
operationid siteid
632126 66
632126 66
df2 =
operationid siteid
632126 66
632126 66
when doing
df1.merge(df2 , how = 'inner' , on = [operationid , siteid ])
I expect
operationid siteid
632126 66
632126 66
Instead, I get
operationid siteid
632126 66
632126 66
632126 66
632126 66
CodePudding user response:
The expected output you will get if you have unique records in df1 and df2. Remove duplicate and try to merge them.
CodePudding user response:
You can use drop_duplicates()
function to drop the duplicates in one of your dataframes before you merge:
df1 = df1.drop_duplicates()
desired_df = df1.merge(df2 , how = 'inner' , on = ['operationid' , 'siteid'])
print(desired_df)
Output:
operationid siteid
0 632126 66
1 632126 66
You can also choose to first check to see if both dataframes are the same, if true
pick one of them and if false
go ahead and merge them and it should output the same result:
desired_df = pd.DataFrame()
if df1.equals(df2) == True:
desired_df = df1
else:
desired_df = df1.merge(df2 , how = 'inner' , on = ['operationid' , 'siteid'])
print(desired_df)