Home > OS >  How to count the number of distinct multiline index in pandas, only by one of the indices components
How to count the number of distinct multiline index in pandas, only by one of the indices components

Time:06-16

I have a dataframe that looks like this: Input dataframe

I want to find the contribution of each category to the Price(USD) column by day. So far I've tried aggregating by Timestamp and Category, with the sum of Price(USD):

df3 = df.groupby(["Timestamp", "Category"]).sum()

Obtaining the following dataset:

Dataset grouped by Timestamp and Category

After this point, I haven't been able to apply a function to each row to divide each Price(USD) by the sum of all different categories in each day and create a new column with these values.

Ideally, a new column "Percentage" would contain :

Percentage

  1. 0.3/(0.3 0.2 0.1)
  2. 0.2/(0.3 0.2 0.1)
  3. 0.1/(0.3 0.2 0.1)

With the same pattern for the rest of the dataframe.

Thank you

CodePudding user response:

Seems like you need

>>> df.groupby(["Timestamp", "Category"]).sum() / df.groupby(["Timestamp"]).sum()

CodePudding user response:

here is another way about it

df.groupby(['Timestamp','Category'])['price'].transform(sum) /  df.groupby(['Timestamp'])['price'].transform(sum)
  • Related