Home > Mobile >  build df from sum of column value
build df from sum of column value

Time:02-24

I need to group the data by customer_id and get the sum of purchase for each months. My data looks like this:

cust_id        months
1               1
1               1
1               2
1               4
2               1
2               1

So I need to see the sum of purchase for each months and each customer. The desired output is:

cust_id     mo1     mo2     mo3     mo4
1           2       1       0       1
1           2       0       0       0

CodePudding user response:

Use crosstab with DataFrame.reindex for add missing categories:

r = range(df['months'].min(), df['months'].max()   1)
df = (pd.crosstab(df['cust_id'],df['months'])
        .reindex(r, axis=1, fill_value=0)
        .add_prefix('mo'))
print (df)
months   mo1  mo2  mo3  mo4
cust_id                    
1          2    1    0    1
2          2    0    0    0

If need all months is possible use ordered categoricals:

df['months'] = pd.Categorical(df['months'], ordered=True, categories=range(1, 13))

df = df.groupby(['cust_id','months']).size().unstack(fill_value=0).add_prefix('mo')
print (df)
months   mo1  mo2  mo3  mo4  mo5  mo6  mo7  mo8  mo9  mo10  mo11  mo12
cust_id                                                               
1          2    1    0    1    0    0    0    0    0     0     0     0
2          2    0    0    0    0    0    0    0    0     0     0     0

Or reindex by range for all months:

r = range(1, 13)
df = (pd.crosstab(df['cust_id'],df['months'])
        .reindex(r, axis=1, fill_value=0)
        .add_prefix('mo'))
print (df)
months   mo1  mo2  mo3  mo4  mo5  mo6  mo7  mo8  mo9  mo10  mo11  mo12
cust_id                                                               
1          2    1    0    1    0    0    0    0    0     0     0     0
2          2    0    0    0    0    0    0    0    0     0     0     0
  • Related