Drop rows in a data frame that exist in another data frame-CodePudding

I have data frame 1 that is my dataset, and data frame 2 that has the rows that I need to drop from df1 but that also currently exist in df 1.

I am using the code trades = trades[~trades_out3].reset_index(drop=True) but that comes with the error TypeError: bad operand type for unary ~: 'DatetimeArray'. I am really unsure about how to proceed, any advice or help would be appreciated.

trades = original data frame that has all the rows/ data trades_out3 = rows that I want to drop from trades

CodePudding user response：

Here:

trades[trades.merge(trades_out3, on=list(trades.columns), how='left', indicator=True)["_merge"] == 'left_only']

The logic: merge the dataframes, only keep those that are in the left (first) dataframe.

CodePudding user response：

trades = pd.concat([trades, trades_out3], axis=1).drop_duplicates(keep=False, ignore_index=True)

Combine the two dataframes and then drop the duplicate values.

ignore_index=True effectively does the same as .reset_index(drop=True)

CodePudding user response：

You can do a left outer join with trades being your left dataframe and trades_out3 being your right dataframe. Then drop the merged data. Something like:

final_df = pd.merge(trades, trades_out3, how="outer", indicator=True)
final_df = final_df[final_df['_merge'] == 'left_only'].drop('_merge', axis=1)