I want to drop duplicates on my dataframe. I know I can use subset
to type out all columns I want to perform it on, however I have 50 columns. Is there a way to include all columns and exclude a subset?
For example include column B,C,D,E,G,H,I, etc. and exclude A and F.
Something like:
df.drop_duplicates(subset_to_exclude=['A', 'F'])
Thanks.
CodePudding user response:
Maybe this could be an approach for you (List comprehension)?
df = pd.DataFrame({
'A': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
'B': ['cup', 'cup', 'cup', 'pack', 'pack'],
'F': [4, 4, 3.5, 15, 5]
})
df.drop_duplicates(subset=[val for val in df.columns if val != "A" and val != "F"])
A B F
0 Yum Yum cup 4.0
3 Indomie pack 15.0
print(df.drop_duplicates(subset=["B"]))
A B F
0 Yum Yum cup 4.0
3 Indomie pack 15.0