Find duplicates, keep first and replace rest-CodePudding

I have a dataframe that contains duplicate values in columns, I need to identify the duplicates, keep the first and replace the others, could be zero or NaN

Here is an example of the df

id	Tables	Industry	Logistic	Comercial	Feedback	Return	Finished
1	166	325158.0	NaN	None	2140957.0	NaN	NaN
2	379	161616.0	417296.0	None	1808454.0	NaN	NaN
3	136	1729.0.	417296.0	None	1734326.0	NaN	217
4	1173	174533.0	417296.0	None	1734188.0	NaN	217
5	111	1531.0	406413.0	None	1714706.0	NaN	217

Expected result:

id	Tables	Industry	Logistic	Comercial	Feedback	Return	Finished
1	166	325158.0	NaN	None	2140957.0	NaN	NaN
2	379	161616.0	417296.0	None	1808454.0	NaN	NaN
3	136	1729.0.	None	None	1734326.0	NaN	217
4	1173	174533.0	None	None	1734188.0	NaN	None
5	111	1531.0	406413.0	None	1714706.0	NaN	None

CodePudding user response：

To, for example, replace the duplicates in the Logistic column, first find the duplicates:

df.Logistic.duplicated()

By default, keep = ‘first’ (more info about pd.duplicated)

So when you localize al duplicates using .loc, you can easily replace them:

df.loc[df.Logistic.duplicated(), 'Logistic'] = None

A for-loop can do this trick for all your columns:

for col_name in df.columns:
    df.loc[df[col_name].duplicated(), col_name] = None

If anyone has a more elegant way to do it for all columns, would love to see it :)