Home > Net >  How to assign/change values to top N values in dataframe using nlargest?
How to assign/change values to top N values in dataframe using nlargest?

Time:03-29

So using .nlargest I can get top N values from my dataframe.

Now if I run the following code:

df.nlargest(25, 'Change')['TopN']='TOP 25'

I expect to change all affected values in TopN column to become TOP 25. But somehow this assignemnt does not work and those rows remain unaffected. What am I doing wrong?

CodePudding user response:

Assuming you really want the TOPN (limited to N values as nlargest would do), use the index from df.nlargest(25, 'Change') and loc:

df.loc[df.nlargest(25, 'Change').index, 'TopN'] = 'TOP 25'

Note the difference with the other approach that will give you all matching values:

df.loc[df['Change'].isin(df['Change'].nlargest(25)), 'TopN'] = 'TOP 25'

Highlighting the difference:

df = pd.DataFrame({'Change': [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]})
df.loc[df.nlargest(4, 'Change').index, 'TOP4 (A)'] = 'X'
df.loc[df['Change'].isin(df['Change'].nlargest(4)), 'TOP4 (B)'] = 'X'

output:

    Change TOP4 (A) TOP4 (B)
0        1      NaN      NaN
1        2      NaN      NaN
2        3      NaN      NaN
3        4        X        X
4        5        X        X
5        1      NaN      NaN
6        2      NaN      NaN
7        3      NaN      NaN
8        4      NaN        X
9        5        X        X
10       1      NaN      NaN
11       2      NaN      NaN
12       3      NaN      NaN
13       4      NaN        X
14       5        X        X

CodePudding user response:

one thing to be aware of is that nlargest does not return ties by default, as in, on the 25th position if you have 5 rows where Change = 25th ranked value, nlargest would only return 25 rows rather than 29 rows unless you specify the parameter keep to be all

Using this parameter, it would be possible to identify the top 25 as

df.loc[df.nlargest(25, 'Change', 'all').index, 'TopN'] = 'Top 25'

CodePudding user response:

Solution for compare top25 values by all values of column is:

df.loc[df['Change'].isin(df['Change'].nlargest(25)), 'TopN'] = 'TOP 25'
  • Related