Home > OS >  How to find row with the highest value for the day in pandas and gets the categorial percentages?
How to find row with the highest value for the day in pandas and gets the categorial percentages?

Time:12-06

Here is my dataset:

date               CAT_A     CAT_B     CAT_C
2018-01-01 5:00    12        223       155
2018-01-01 6:00    199       68        72
...
2018-12-31 23:00   56        92        237

The data shows every hour for every day of the year. So I want to know in pandas how I can find the highest value row for each day, and then get the categorical percentages at that hour. For example if the highest hour was 5:00 for day 01-01 then CAT_A: 3.07%, CAT_B: 57.2% CAT_C: 29.7%

CodePudding user response:

We sum the three columns:

df["sum_categories"] = df.sum(axis=1)

We groupby on daily basis and obtain the index of the max daily row:

idx = df.resample("D")["sum_categories"].idxmax()

We select the rows with this index and calculate proportion:

df.loc[idx,["CAT_A", "CAT_B", "CAT_C"]].div(df.loc[idx,"sum_categories"].values) 

CodePudding user response:

Use DataFrameGroupBy.idxmax by Series created by sum and divide by DataFrame.div filtered rows by DataFrame.loc, multiple by 100 and round:

#if necessary DatetimeIndex
#df = df.set_index('date')

s = df.sum(axis=1)
idx = s.groupby(pd.Grouper(freq="D")).idxmax()
df = df.loc[idx].div(s.loc[idx], axis=0).mul(100).round(2)
print (df)
                     CAT_A  CAT_B  CAT_C
date                                    
2018-01-01 05:00:00   3.08  57.18  39.74
  • Related