Home > Software engineering >  Compare 2 dataframes and identify unique rows (Python Pandas)
Compare 2 dataframes and identify unique rows (Python Pandas)

Time:05-02

I have a table in GCP (df_1) which contains a dataset with 18 columns and 80,000 rows. I also have an .xlsx file df_2 which contains the same columns and about 40,000 rows, most of which should already be present in df_1.

I am trying to use Python Pandas to compare df_1 and df_2 and return rows which appear in df_2 but not in df_1 and then append the results to the df_1 table in GCP.

After reading both df_1 and df_2 I am trying to run the following to get the rows which do not appear in df_1 -

df_unique = df_2[~(df_2['Column1'].isin(df_1['Column1']) & df_2['Column2'].isin(df_1['Column2']))]

However this is returning all rows in df_2.

CodePudding user response:

Instead of using the bitwise & operator use and:

df_unique = df_2[~(df_2['Column1'].isin(df_1['Column1']) and df_2['Column2'].isin(df_1['Column2']))]

CodePudding user response:

Hey you just want that :

df_unique = df_2.loc[~df_2['Column1'].isin(df_1['Column1'],:]
df_1=df_1.append(df_unique)
  • Related