How to merge dataframes where a value is not matching in other column-CodePudding

I have huge data and I want merge it on a specific column where values from df1 is not available in df2 and vice versa.

Example:

df1:

Domain        Sales
google.com    100
facebook.com  200
youtube.com   300

df2:

Domain       Sales
google.com   100
yahoo.com    200
youtube.com  300

Required output:

Domain          Sales
facebook.com    200
yahoo.com       200

I have tried:

df = pd.merge(df1, df2, on="Domain", how="outer")

and all the other values for the how parameter, but it does not give me the required output. How can I achieve the required output?

CodePudding user response：

pd.concat([df1, df2]).drop_duplicates('Domain', keep=False)

CodePudding user response：

As you have a huge dataset you can use set operations to pre-filter the data before concat:

S = set(df1['Domain']).symmetric_difference(df2['Domain'])
out = pd.concat([d.loc[lambda d: d['Domain'].isin(S)] for d in [df1, df2]])

output:

         Domain  Sales
1  facebook.com    200
1     yahoo.com    200

CodePudding user response：

We still can try with merge

df = df1.merge(df2,on = 'Domain', indicator = True, how='outer').query('_merge!="both"')
df['Sales'] = df.pop('Sales_x').fillna(df.pop('Sales_y'))
df
Out[342]: 
         Domain      _merge  Sales
1  facebook.com   left_only  200.0
3     yahoo.com  right_only  200.0