I have a table in GCP (df_1) which contains a dataset with 18 columns and 80,000 rows. I also have an .xlsx file df_2 which contains the same columns and about 40,000 rows, most of which should already be present in df_1.
I am trying to use Python Pandas to compare df_1 and df_2 and return rows which appear in df_2 but not in df_1 and then append the results to the df_1 table in GCP.
After reading both df_1 and df_2 I am trying to run the following to get the rows which do not appear in df_1 -
df_unique = df_2[~(df_2['Column1'].isin(df_1['Column1']) & df_2['Column2'].isin(df_1['Column2']))]
However this is returning all rows in df_2.
CodePudding user response:
Instead of using the bitwise &
operator use and
:
df_unique = df_2[~(df_2['Column1'].isin(df_1['Column1']) and df_2['Column2'].isin(df_1['Column2']))]
CodePudding user response:
Hey you just want that :
df_unique = df_2.loc[~df_2['Column1'].isin(df_1['Column1'],:]
df_1=df_1.append(df_unique)