How to count the number of distinct multiline index in pandas, only by one of the indices components-CodePudding

I have a dataframe that looks like this: Input dataframe

I want to find the contribution of each category to the Price(USD) column by day. So far I've tried aggregating by Timestamp and Category, with the sum of Price(USD):

df3 = df.groupby(["Timestamp", "Category"]).sum()

Obtaining the following dataset:

Dataset grouped by Timestamp and Category

After this point, I haven't been able to apply a function to each row to divide each Price(USD) by the sum of all different categories in each day and create a new column with these values.

Ideally, a new column "Percentage" would contain :

Percentage

0.3/(0.3 0.2 0.1)
0.2/(0.3 0.2 0.1)
0.1/(0.3 0.2 0.1)

With the same pattern for the rest of the dataframe.

Thank you

CodePudding user response：

Seems like you need

>>> df.groupby(["Timestamp", "Category"]).sum() / df.groupby(["Timestamp"]).sum()

CodePudding user response：

here is another way about it

df.groupby(['Timestamp','Category'])['price'].transform(sum) /  df.groupby(['Timestamp'])['price'].transform(sum)