Home > database >  Group by date and sum value of the top 1% percentile?
Group by date and sum value of the top 1% percentile?

Time:03-22

date        balance
2020-03-31   1000
2020-03-31    900
2020-03-31    800
2020-03-31    700
2020-03-31    200
2020-03-31    100
....
2020-03-31     20
2020-03-31      1
2020-03-31     0.3
....
2020-06-30   3420
2020-06-30   3000
2020-06-30   2000
....
2020-06-30     30
2020-06-30      3
....
2020-09-30  10000
2020-09-30   3000
..
2020-09-30      3

I want to group by date and sum value across those that belong to the largest 1% percentile. I used

book2  = book.groupby(['date'])['balance'].agg([lambda x : np.quantile(x, q=0.99), "sum"])

but this is giving me a strange value... Any idea how to solve this? Thanks!

CodePudding user response:

Search all values above the top 1% then sum them for each date:

df.groupby('date')['balance'].apply(lambda x: x[x >= np.quantile(x, q=0.99)].sum())
  • Related