Home > OS >  Compare two DataFrames and output a new DataFrame with the different index
Compare two DataFrames and output a new DataFrame with the different index

Time:04-15

I have two different DataFrames and i wanna get a new DataFrame with the indexes that are different.

DataFrame A:

Name Color
Jony Blue
Mike Red
Joanna Green

DataFrame B:

Name Color
Jony Blue
Mike Red

DataFrame Output:

Name Color
Joanna Green

How can i do to get this DataFrame Output?

CodePudding user response:

Maybe using 'symmetric difference' on a set that you convert to a Series of df ?

dfc=pd.DataFrame()

dfc['Name Color']=pd.Series(list(set(dfa['Name Color']).symmetric_difference(set(dfb['Name Color']))))

CodePudding user response:

One option is to outer-merge with the indicator parameter set to True. Then the common rows will be flagged "both" and since you don't want the common rows, you filter them out:

out = df1.merge(df2, how='outer', indicator=True).query('_merge!="both"').drop(columns='_merge')

Output:

     Name  Color
2  Joanna  Green

CodePudding user response:

Using drop_duplicates

import pandas as pd
dataA = {'Name':['Jony', 'Mike', 'Joanna'], 'Color':['Blue', 'Red', 'Green']}
dataB = {'Name':['Jony', 'Mike'], 'Color':['Blue', 'Red']}

dfA = pd.DataFrame(dataA)
dfB = pd.DataFrame(dataB)

df = pd.concat([dfA, dfB]).drop_duplicates(keep=False, ignore_index=True)

CodePudding user response:

Assuming the Name Column is the index in both dataframes:

df_a = pd.DataFrame({'Color': {'Jony': 'Blue', 'Mike': 'Red', 'Joanna': 'Green'}})
df_a = df_a.rename_axis('Name')
df_b = pd.DataFrame({'Color': {'Jony': 'Blue', 'Mike': 'Red'}})
df_b = df_b.rename_axis('Name')

df = pd.concat([df_a[~df_a.index.isin(df_b.index)], df_b[~df_b.index.isin(df_a.index)]])
print(df)

Output:

        Color
Name         
Joanna  Green
  • Related