Home > Blockchain >  Aggregate using values within a column
Aggregate using values within a column

Time:10-10

I have a dataframe:

Region Month
0 AB 1
1 AB 1
2 AB 2
3 BC 3
4 BC 4
5 BC 5

How do I turn this dataframe to the below where the distinct values of the region are columns and the distinct value of each month are the rows. The values are the sum of count of all stance of the region/month combination. Since I have a the list of Region and Month I can also specify them as a list, as long as it gives me the view below.

I used df.groupby(['Region', 'Month'], as_index=True).agg({"Month": "count"}) however the layout is different where the month is a column instead of the layout below.

AB BC
JAN 2 0
FEB 1 0
MAR 0 1
APR 0 1
MAY 0 1
JUN 0 0
JUL 0 0
AUG 0 0
SEP 0 0
OCT 0 0
NOV 0 0
DEC 0 0

CodePudding user response:

Update:

df_out = pd.crosstab(df['Month'], df['Region'])
df_out.index = df_out.index.map(dict(enumerate(calendar.month_abbr)))
df_out = df_out.reindex(calendar.month_abbr, fill_value=0)
print(df_out)

Output:

Region  AB  BC
Month         
         0   0
Jan      2   0
Feb      1   0
Mar      0   1
Apr      0   1
May      0   1
Jun      0   0
Jul      0   0
Aug      0   0
Sep      0   0
Oct      0   0
Nov      0   0
Dec      0   0

Let's try this:

import calendar

indx = [m.upper() for m in calendar.month_abbr[1:]]

pd.crosstab(df['Month'], df['Region']).reindex(indx, fill_value=0)

Output:

Region  AB  BC
Month         
JAN      2   0
FEB      1   0
MAR      0   1
APR      0   1
MAY      0   1
JUN      0   0
JUL      0   0
AUG      0   0
SEP      0   0
OCT      0   0
NOV      0   0
DEC      0   0

Details:

Use pd.crosstab to get counts per month by region, and using the calendar library we can get a list of all month abbreviations in order and fill in the missing value of the resultant dataframe from pd.crosstab.

  • Related