How to de-duplicate values within a column while groupby another column in Python?-CodePudding

I am new to Python and do not know how to combine groupby and duplicated functions to solve my problem. I need to de-duplicate Negative values in column Result and keep only the first-appearing Negative value and set the rest to NaN while grouping by column Year (see the table below).

here is my dataframe:

df = pd.DataFrame( { "Year" : ["2004", "2004", "2004", "2005", "2005","2005", "2005", "2003","2003", "2003", "2003"] , "Result" : ["NaN", "Negative", "Negative", "Negative", "NaN", "Negative", "NaN","Neative", "NaN", "Negative", "NaN"] } )

I used this code which doesnt work:

df['Result'] = df.groupby(['Year'])['Result'].duplicated()

The original table looks like this:

Year	Result
2004	NaN
2004	Negative
2004	Negative
2005	Negative
2005	NaN
2005	Negative
2005	NaN
2003	Negative
2003	NaN
2003	Negative
2003	NaN

But I want to de-duplicate 'Negative' values in the 'Result' column, grouped by 'Year', and update the 'Result' column, so it looks like below:

Year	Result
2004	NaN
2004	Negative
2004	NaN
2005	Negative
2005	NaN
2005	NaN
2005	NaN
2003	Negative
2003	NaN
2003	NaN
2003	NaN

CodePudding user response：

Use DataFrame.duplicated with Series.mask:

df['Result'] = df['Result'].mask(df.duplicated(['Year','Result']))
print (df)
    Year    Result
0   2004       NaN
1   2004  Negative
2   2004       NaN
3   2005  Negative
4   2005       NaN
5   2005       NaN
6   2005       NaN
7   2003  Negative
8   2003       NaN
9   2003       NaN
10  2003       NaN