I'm new you to pandas and python, and I want to remove duplicates but give it a priority. It's hard to explain but I will give an example to make it clear
ID Phone Email
0001 0234 null
0001 null [email protected]
0001 0234 [email protected]
how I can remove the duplicates in ID and leave the third one because it has both phone and email and not removing it randomly
CodePudding user response:
First Drop NaNs in rows and then drop duplicates
df2 = df.dropna(subset=['Phone']).dropna(subset=['Email']).drop_duplicates('ID')
CodePudding user response:
You can just drop the NaN
values based on Phone
and Email
.
df.dropna(subset=['Phone', 'Email'], inplace=True)