I have a dataframe which contains orders. Each product has a color. I want to create a (line) plot of monthly data and show the occurrence of colors throughout the month.
A snippet of the current dataframe:
Color
2021-08-25 17:43:30 Blue
2021-08-25 17:26:34 Blue
2021-08-25 17:15:51 Green
2021-09-02 14:23:19 Blue
2021-09-04 18:11:17 Yellow
I thought I needed to create an extra column with the percentage of occurrence throughout the month first. I tried using:
df.groupby(['Color']).Color.agg([('Color_count', 'count')]).reset_index()
Which gave me:
Color Color_count
0 Blue 2
1 Green 1
The desired output should give me columns with all the colors and the percentage of occurrence per month, something like:
Blue Green Yellow
2021-08-31 0.73 0.24 0.00
2021-09-30 0.66 0.29 0.01
With those percentages I can make a plot to show monthly data of the colors.
Thank you in advance.
CodePudding user response:
Use Grouper
with SeriesGroupBy.value_counts
and Series.unstack
:
df1 = (df.groupby(pd.Grouper(freq='M'))['Color']
.value_counts(normalize=True)
.unstack(fill_value=0)
.rename_axis(None, axis=1))
print (df1)
Blue Green Yellow
2021-08-31 0.666667 0.333333 0.0
2021-09-30 0.500000 0.000000 0.5