I want to remove the columns from df1 which are also found in df2.
df1 have additional columns compared to df2 which I have not included here to simplify the question
df1 = pd.date_range(start='1/1/2022', end='1/04/2022', freq='D')
df2 = pd.date_range(start='1/1/2022', end='1/08/2022', freq='D')
df1 = pd.DataFrame(df1, columns=['date'])
df2 = pd.DataFrame(df2, columns=['date'])
# this line does not remove the duplicates
df3 = df2.drop_duplicates(df1.columns[0:])
CodePudding user response:
df1.drop(columns=df2.columns, errors='ignore', inplace=True)
or
df1 = df1.drop(columns=df2.columns, errors='ignore')
CodePudding user response:
drop_duplicates
only works for rows. If you want to remove columns from one df that are in another df, use ~df2.columns.isin(df1.columns)
to return False for columns that are in df1
(False
= should not be kept), and True
for columns that are not in df1
(True
= should be kept). Then pass the resulting column mask to .loc
, at the second position.
This line will remove the columns from df2
that are in df1
:
df2 = df2.loc[:, ~df2.columns.isin(df1.columns)]