How to find row with the highest value for the day in pandas and gets the categorial percentages?-CodePudding

Here is my dataset:

date               CAT_A     CAT_B     CAT_C
2018-01-01 5:00    12        223       155
2018-01-01 6:00    199       68        72
...
2018-12-31 23:00   56        92        237

The data shows every hour for every day of the year. So I want to know in pandas how I can find the highest value row for each day, and then get the categorical percentages at that hour. For example if the highest hour was 5:00 for day 01-01 then CAT_A: 3.07%, CAT_B: 57.2% CAT_C: 29.7%

CodePudding user response：

We sum the three columns:

df["sum_categories"] = df.sum(axis=1)

We groupby on daily basis and obtain the index of the max daily row:

idx = df.resample("D")["sum_categories"].idxmax()

We select the rows with this index and calculate proportion:

df.loc[idx,["CAT_A", "CAT_B", "CAT_C"]].div(df.loc[idx,"sum_categories"].values)

CodePudding user response：

Use DataFrameGroupBy.idxmax by Series created by sum and divide by DataFrame.div filtered rows by DataFrame.loc, multiple by 100 and round:

#if necessary DatetimeIndex
#df = df.set_index('date')

s = df.sum(axis=1)
idx = s.groupby(pd.Grouper(freq="D")).idxmax()
df = df.loc[idx].div(s.loc[idx], axis=0).mul(100).round(2)
print (df)
                     CAT_A  CAT_B  CAT_C
date                                    
2018-01-01 05:00:00   3.08  57.18  39.74