Home > Mobile >  find duplicates and make other cells null except the first one. pandas
find duplicates and make other cells null except the first one. pandas

Time:09-03

I have a dataframe with duplicates values in one column:

A description
First hello
Second hello
Third hello
Fourth why
Fifth why

I want that only first duplicate value remain while other become Nan.

Desired output is

A description
First hello
Second Nan
Third Nan
Fourth why
Fifth Nan

Thanks

CodePudding user response:

Use boolean indexing with duplicated:

df.loc[df['description'].duplicated(), 'description'] = pd.NA # or float('nan')

Alternative for successive duplicates:

m = df['description'].eq(df['description'].shift())
df.loc[m, 'description'] = pd.NA

output:

        A description
0   First       hello
1  Second        <NA>
2   Third        <NA>
3  Fourth         why
4   Fifth        <NA>
  • Related