So using .nlargest
I can get top N values from my dataframe.
Now if I run the following code:
df.nlargest(25, 'Change')['TopN']='TOP 25'
I expect to change all affected values in TopN
column to become TOP 25
. But somehow this assignemnt does not work and those rows remain unaffected. What am I doing wrong?
CodePudding user response:
Assuming you really want the TOPN (limited to N values as nlargest
would do), use the index from df.nlargest(25, 'Change')
and loc
:
df.loc[df.nlargest(25, 'Change').index, 'TopN'] = 'TOP 25'
Note the difference with the other approach that will give you all matching values:
df.loc[df['Change'].isin(df['Change'].nlargest(25)), 'TopN'] = 'TOP 25'
Highlighting the difference:
df = pd.DataFrame({'Change': [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]})
df.loc[df.nlargest(4, 'Change').index, 'TOP4 (A)'] = 'X'
df.loc[df['Change'].isin(df['Change'].nlargest(4)), 'TOP4 (B)'] = 'X'
output:
Change TOP4 (A) TOP4 (B)
0 1 NaN NaN
1 2 NaN NaN
2 3 NaN NaN
3 4 X X
4 5 X X
5 1 NaN NaN
6 2 NaN NaN
7 3 NaN NaN
8 4 NaN X
9 5 X X
10 1 NaN NaN
11 2 NaN NaN
12 3 NaN NaN
13 4 NaN X
14 5 X X
CodePudding user response:
one thing to be aware of is that nlargest
does not return ties by default, as in, on the 25th position if you have 5 rows where Change
= 25th ranked value, nlargest would only return 25 rows rather than 29 rows unless you specify the parameter keep
to be all
Using this parameter, it would be possible to identify the top 25 as
df.loc[df.nlargest(25, 'Change', 'all').index, 'TopN'] = 'Top 25'
CodePudding user response:
Solution for compare top25 values by all values of column is:
df.loc[df['Change'].isin(df['Change'].nlargest(25)), 'TopN'] = 'TOP 25'