Here is my dataset:
date CAT_A CAT_B CAT_C
2018-01-01 5:00 12 223 155
2018-01-01 6:00 199 68 72
...
2018-12-31 23:00 56 92 237
The data shows every hour for every day of the year. So I want to know in pandas how I can find the highest value row for each day, and then get the categorical percentages at that hour. For example if the highest hour was 5:00 for day 01-01 then CAT_A: 3.07%, CAT_B: 57.2% CAT_C: 29.7%
CodePudding user response:
We sum the three columns:
df["sum_categories"] = df.sum(axis=1)
We groupby on daily basis and obtain the index of the max daily row:
idx = df.resample("D")["sum_categories"].idxmax()
We select the rows with this index and calculate proportion:
df.loc[idx,["CAT_A", "CAT_B", "CAT_C"]].div(df.loc[idx,"sum_categories"].values)
CodePudding user response:
Use DataFrameGroupBy.idxmax
by Series created by sum
and divide by DataFrame.div
filtered rows by DataFrame.loc
, multiple by 100
and round:
#if necessary DatetimeIndex
#df = df.set_index('date')
s = df.sum(axis=1)
idx = s.groupby(pd.Grouper(freq="D")).idxmax()
df = df.loc[idx].div(s.loc[idx], axis=0).mul(100).round(2)
print (df)
CAT_A CAT_B CAT_C
date
2018-01-01 05:00:00 3.08 57.18 39.74