I have a df
as follows:
Date Stock Category
2020-01-01 AAA 1
2020-01-01 BBB 2
2020-01-01 CCC 1
2020-01-02 AAA 1
2020-01-02 BBB 2
What I am looking to achieve is sum
by Date
and then cumsum
across Date
while grouping by Category
So the result would look like
Date Category CUMSUM
2020-01-01 1 2
2020-01-01 2 1
2020-01-02 1 3
2020-01-02 2 2
Essentially on 2020-01-01
2 Stocks
were in Category 1
and on 2020-01-02 1 Stock
was in Category 1
resulting in cumsum=3
on 2020-01-02
.
I have tried df.groupby(['Date','Category']).agg({'STOCK':'nunique'}).groupby(level=0).cumsum().reset_index()
but this resulted in each Category
having the same CUMSUM
on any given day.
Any ideas on how I can fix this?
Thanks!
CodePudding user response:
You pretty much had it, but in your second groupby
operation you'll want to group on "Category"
in to calculate a cumulative sum across your dates.
An implicit assumption is that your dates will be sorted after the first groupby operation. So make sure your keep sort=True
(the default) for your first groupby
operation. OR ensure your data is sorted by date prior to the groupby opertaion if you want to specify sort=False
for any reason.
out = (
df.groupby(["Date", "Category"])
["Stock"].nunique()
.groupby("Category")
.cumsum()
)
print(out)
Date Category
2020-01-01 1 2
2 1
2020-01-02 1 3
2 2
Name: Stock, dtype: int64