I need to group the data by customer_id and get the sum of purchase for each months. My data looks like this:
cust_id months
1 1
1 1
1 2
1 4
2 1
2 1
So I need to see the sum of purchase for each months and each customer. The desired output is:
cust_id mo1 mo2 mo3 mo4
1 2 1 0 1
1 2 0 0 0
CodePudding user response:
Use crosstab
with DataFrame.reindex
for add missing categories:
r = range(df['months'].min(), df['months'].max() 1)
df = (pd.crosstab(df['cust_id'],df['months'])
.reindex(r, axis=1, fill_value=0)
.add_prefix('mo'))
print (df)
months mo1 mo2 mo3 mo4
cust_id
1 2 1 0 1
2 2 0 0 0
If need all months is possible use ordered categoricals:
df['months'] = pd.Categorical(df['months'], ordered=True, categories=range(1, 13))
df = df.groupby(['cust_id','months']).size().unstack(fill_value=0).add_prefix('mo')
print (df)
months mo1 mo2 mo3 mo4 mo5 mo6 mo7 mo8 mo9 mo10 mo11 mo12
cust_id
1 2 1 0 1 0 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0
Or reindex
by range
for all months:
r = range(1, 13)
df = (pd.crosstab(df['cust_id'],df['months'])
.reindex(r, axis=1, fill_value=0)
.add_prefix('mo'))
print (df)
months mo1 mo2 mo3 mo4 mo5 mo6 mo7 mo8 mo9 mo10 mo11 mo12
cust_id
1 2 1 0 1 0 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0