I am getting my head around Python and Pandas library and trying out some basics but getting lost in documentation.
I have a Pandas
DataFrame
A B C D
1 2 3 4
3 4 1 7
6 9 0 1
... other 10k rows.
I now want to add a column say 'E' and that should read True
/False
if value of 'D' is in top 10% of the entire column.
One way I tried is to sort descending by column 'D' and then update top 10% rows, in this way I am able to sort but have not yet figured out how to update top 10% rows
also this way alters the original order which isn't desirable.
df = df.sort_values('D',ascending=False)
df.iloc[:0, :(df.shape[0]-1)/10, 5] = value --- this doesn't work.
Just checking if there is a way to achieve this without sorting ? if not, how do I update top 10% rows if they were sorted ?
Thanks
CodePudding user response:
If you don't want to use built-in quantile
. Use sort method :-
top_10_pc = int(len(df.index) * 0.1)
min_val = min(df.sort_values(by=['D'], ascending=False)[:top_10_pc]['D'])
df['E'] = df['D'] >= min_val
CodePudding user response:
If need top10 values without sorting with duplicates use np.argsort
:
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(30, size=(20, 5)), columns=list('ABCDE'))
n = 10
N = int(len(df.index)*(n/100))
print (N)
2
df['mask'] = np.argsort(np.argsort(-df['E'].to_numpy())) < N
print (df)
A B C D E mask
0 20 21 25 0 13 False
1 22 12 27 29 21 True
2 29 24 12 22 6 False
3 6 6 1 5 7 False
4 1 14 1 28 5 False
5 26 2 16 3 17 False
6 16 18 22 27 20 False
7 29 24 5 17 6 False
8 10 14 7 21 6 False
9 9 21 22 25 18 False
10 10 4 13 10 19 False
11 25 18 26 15 8 False
12 10 12 21 11 19 False
13 1 14 17 25 18 False
14 7 21 19 27 12 False
15 23 19 9 4 9 False
16 7 25 7 7 20 False
17 27 29 11 27 19 False
18 18 14 25 27 18 False
19 21 18 26 0 20 True
If need all top2 values is possible compare with Series.nlargest
and Series.isin
:
df['mask'] = df['E'].isin(df['E'].nlargest(2))
print (df)
A B C D E mask
0 20 21 25 0 13 False
1 22 12 27 29 21 True
2 29 24 12 22 6 False
3 6 6 1 5 7 False
4 1 14 1 28 5 False
5 26 2 16 3 17 False
6 16 18 22 27 20 True
7 29 24 5 17 6 False
8 10 14 7 21 6 False
9 9 21 22 25 18 False
10 10 4 13 10 19 False
11 25 18 26 15 8 False
12 10 12 21 11 19 False
13 1 14 17 25 18 False
14 7 21 19 27 12 False
15 23 19 9 4 9 False
16 7 25 7 7 20 True
17 27 29 11 27 19 False
18 18 14 25 27 18 False
19 21 18 26 0 20 True