find duplicates and make other cells null except the first one. pandas-CodePudding

I have a dataframe with duplicates values in one column:

I want that only first duplicate value remain while other become Nan.

Desired output is

Thanks

CodePudding user response：

df.loc[df['description'].duplicated(), 'description'] = pd.NA # or float('nan')

Alternative for successive duplicates:

m = df['description'].eq(df['description'].shift())
df.loc[m, 'description'] = pd.NA

output:

        A description
0   First       hello
1  Second        <NA>
2   Third        <NA>
3  Fourth         why
4   Fifth        <NA>