I have a dataframe:
Region | Month | |
---|---|---|
0 | AB | 1 |
1 | AB | 1 |
2 | AB | 2 |
3 | BC | 3 |
4 | BC | 4 |
5 | BC | 5 |
How do I turn this dataframe to the below where the distinct values of the region are columns and the distinct value of each month are the rows. The values are the sum of count of all stance of the region/month combination. Since I have a the list of Region and Month I can also specify them as a list, as long as it gives me the view below.
I used df.groupby(['Region', 'Month'], as_index=True).agg({"Month": "count"})
however the layout is different where the month is a column instead of the layout below.
AB | BC | |
---|---|---|
JAN | 2 | 0 |
FEB | 1 | 0 |
MAR | 0 | 1 |
APR | 0 | 1 |
MAY | 0 | 1 |
JUN | 0 | 0 |
JUL | 0 | 0 |
AUG | 0 | 0 |
SEP | 0 | 0 |
OCT | 0 | 0 |
NOV | 0 | 0 |
DEC | 0 | 0 |
CodePudding user response:
Update:
df_out = pd.crosstab(df['Month'], df['Region'])
df_out.index = df_out.index.map(dict(enumerate(calendar.month_abbr)))
df_out = df_out.reindex(calendar.month_abbr, fill_value=0)
print(df_out)
Output:
Region AB BC
Month
0 0
Jan 2 0
Feb 1 0
Mar 0 1
Apr 0 1
May 0 1
Jun 0 0
Jul 0 0
Aug 0 0
Sep 0 0
Oct 0 0
Nov 0 0
Dec 0 0
Let's try this:
import calendar
indx = [m.upper() for m in calendar.month_abbr[1:]]
pd.crosstab(df['Month'], df['Region']).reindex(indx, fill_value=0)
Output:
Region AB BC
Month
JAN 2 0
FEB 1 0
MAR 0 1
APR 0 1
MAY 0 1
JUN 0 0
JUL 0 0
AUG 0 0
SEP 0 0
OCT 0 0
NOV 0 0
DEC 0 0
Details:
Use pd.crosstab
to get counts per month by region, and using the calendar library we can get a list of all month abbreviations in order and fill in the missing value of the resultant dataframe from pd.crosstab.