Home > Back-end >  Remove duplicate dataframe from another dataframe columns
Remove duplicate dataframe from another dataframe columns

Time:03-13

I want to remove the columns from df1 which are also found in df2.

df1 have additional columns compared to df2 which I have not included here to simplify the question

df1 = pd.date_range(start='1/1/2022', end='1/04/2022', freq='D')
df2 = pd.date_range(start='1/1/2022', end='1/08/2022', freq='D')
df1 = pd.DataFrame(df1, columns=['date'])
df2 = pd.DataFrame(df2, columns=['date'])

# this line does not remove the duplicates
df3 = df2.drop_duplicates(df1.columns[0:])

CodePudding user response:

df1.drop(columns=df2.columns, errors='ignore', inplace=True)

or

df1 =  df1.drop(columns=df2.columns, errors='ignore')

CodePudding user response:

drop_duplicates only works for rows. If you want to remove columns from one df that are in another df, use ~df2.columns.isin(df1.columns) to return False for columns that are in df1 (False = should not be kept), and True for columns that are not in df1 (True = should be kept). Then pass the resulting column mask to .loc, at the second position.

This line will remove the columns from df2 that are in df1:

df2 = df2.loc[:, ~df2.columns.isin(df1.columns)]
  • Related