Home > Mobile >  Pandas merge is doing cartesian product instead of inner join
Pandas merge is doing cartesian product instead of inner join

Time:01-03

I have two identical frames and instead trying to inner join them, I get a cartesian product. Do you know what can cause that? The type of my columns is Int64.

df1 = 
operationid  siteid
  632126      66
  632126      66
df2 = 
operationid  siteid
  632126      66
  632126      66

when doing

df1.merge(df2 , how = 'inner' , on = [operationid , siteid ])

I expect

operationid  siteid
  632126      66
  632126      66

Instead, I get

   operationid  siteid
       632126      66
       632126      66
       632126      66
       632126      66

CodePudding user response:

The expected output you will get if you have unique records in df1 and df2. Remove duplicate and try to merge them.

CodePudding user response:

You can use drop_duplicates() function to drop the duplicates in one of your dataframes before you merge:

df1 = df1.drop_duplicates()
desired_df = df1.merge(df2 , how = 'inner' , on = ['operationid' , 'siteid'])
print(desired_df)

Output:

   operationid  siteid
0       632126      66
1       632126      66

You can also choose to first check to see if both dataframes are the same, if true pick one of them and if false go ahead and merge them and it should output the same result:

desired_df = pd.DataFrame()
if df1.equals(df2) == True:
    desired_df = df1
else:
    desired_df = df1.merge(df2 , how = 'inner' , on = ['operationid' , 'siteid'])
    
print(desired_df)
  • Related