Home > Mobile >  pandas groupby, counting unique in a group and summing across groups
pandas groupby, counting unique in a group and summing across groups

Time:03-31

I have a df as follows:

Date          Stock   Category
2020-01-01    AAA        1
2020-01-01    BBB        2
2020-01-01    CCC        1
2020-01-02    AAA        1
2020-01-02    BBB        2

What I am looking to achieve is sum by Date and then cumsum across Date while grouping by Category So the result would look like

Date         Category  CUMSUM
2020-01-01       1       2
2020-01-01       2       1
2020-01-02       1       3
2020-01-02       2       2

Essentially on 2020-01-01 2 Stocks were in Category 1 and on 2020-01-02 1 Stock was in Category 1 resulting in cumsum=3 on 2020-01-02.

I have tried df.groupby(['Date','Category']).agg({'STOCK':'nunique'}).groupby(level=0).cumsum().reset_index() but this resulted in each Category having the same CUMSUM on any given day. Any ideas on how I can fix this? Thanks!

CodePudding user response:

You pretty much had it, but in your second groupby operation you'll want to group on "Category" in to calculate a cumulative sum across your dates.

An implicit assumption is that your dates will be sorted after the first groupby operation. So make sure your keep sort=True (the default) for your first groupby operation. OR ensure your data is sorted by date prior to the groupby opertaion if you want to specify sort=False for any reason.

out = (
    df.groupby(["Date", "Category"])
    ["Stock"].nunique()
    .groupby("Category")
    .cumsum() 
)

print(out)
Date        Category
2020-01-01  1           2
            2           1
2020-01-02  1           3
            2           2
Name: Stock, dtype: int64

  • Related