date balance
2020-03-31 1000
2020-03-31 900
2020-03-31 800
2020-03-31 700
2020-03-31 200
2020-03-31 100
....
2020-03-31 20
2020-03-31 1
2020-03-31 0.3
....
2020-06-30 3420
2020-06-30 3000
2020-06-30 2000
....
2020-06-30 30
2020-06-30 3
....
2020-09-30 10000
2020-09-30 3000
..
2020-09-30 3
I want to group by date and sum value across those that belong to the largest 1% percentile. I used
book2 = book.groupby(['date'])['balance'].agg([lambda x : np.quantile(x, q=0.99), "sum"])
but this is giving me a strange value... Any idea how to solve this? Thanks!
CodePudding user response:
Search all values above the top 1% then sum them for each date:
df.groupby('date')['balance'].apply(lambda x: x[x >= np.quantile(x, q=0.99)].sum())