I have a dataframe with duplicates values in one column:
A | description |
---|---|
First | hello |
Second | hello |
Third | hello |
Fourth | why |
Fifth | why |
I want that only first duplicate value remain while other become Nan.
Desired output is
A | description |
---|---|
First | hello |
Second | Nan |
Third | Nan |
Fourth | why |
Fifth | Nan |
Thanks
CodePudding user response:
Use boolean indexing with duplicated
:
df.loc[df['description'].duplicated(), 'description'] = pd.NA # or float('nan')
Alternative for successive duplicates:
m = df['description'].eq(df['description'].shift())
df.loc[m, 'description'] = pd.NA
output:
A description
0 First hello
1 Second <NA>
2 Third <NA>
3 Fourth why
4 Fifth <NA>