Home > database >  Pandas groupby sales item and count sales per month in column
Pandas groupby sales item and count sales per month in column

Time:11-09

Hi I have a pandas dataframe that looks roughly like this:

Date Item Sales
01-01-2022 iphone $20
02-01-2022 iphone $40
01-02-2022 iphone $40
02-02-2022 macbook $20
03-02-2022 macbook $40
04-02-2022 macbook $50

I am trying to get the count per item per month in a format like this:

Item January February
iphone 2 1
macbook 0 3

and in the end I want a column that takes the average of each column. So the final dataframe would look like:

Item January February Average
iphone 2 1 1,5
macbook 0 3 1,5

I have tried many pandas groupby functions but I can't seem to get it done. Can somebody help me?

I tried to use pd.Grouper(freq='M') but that didn't work out so well. The months don't show up as columns but as rows per sales item. Any ideas?

CodePudding user response:

Use a crosstab:

(pd.crosstab(df['Item'], pd.to_datetime(df['Date'], dayfirst=True).dt.strftime('%B'))
   .assign(Average=lambda d: d.mean(axis=1))
)

Output:

Date     February  January  Average
Item                               
iphone          1        2      1.5
macbook         3        0      1.5

Alternative for sorted month names:

from calendar import month_name

(pd.crosstab(df['Item'], pd.to_datetime(df['Date'], dayfirst=True).dt.month)
   .rename(columns=dict(enumerate(month_name)))
   .assign(Average=lambda d: d.mean(axis=1))
)

Output:

Date     January  February  Average
Item                               
iphone         2         1      1.5
macbook        0         3      1.5

CodePudding user response:

example:

            Item    Sales
Date        
2022-01-01  iphone  $20
2022-02-01  iphone  $40
2022-01-02  iphone  $40
2022-02-02  macbook $20
2022-03-02  macbook $40
2022-04-02  macbook $50

when date column is DatetimeIndex, aggregate by month and make df1

df1 = (df
       .groupby([df.index.to_period(freq='M'), 'Item'])['Sales'].count()
       .unstack(level=0)
       .fillna(0))

output(df1):

Date    2022-01 2022-02 2022-03 2022-04
Item                
iphone  2.0     1.0     0.0     0.0
macbook 0.0     1.0     1.0     1.0

and make columns to local name and make Average column

(df1
 .set_axis(df1.columns.strftime('%B'), axis=1)
 .assign(Average=lambda x: x.mean(axis=1)))

output:

Date    January February    March   April   Average
Item                    
iphone  2.0     1.0         0.0     0.0     0.75
macbook 0.0     1.0         1.0     1.0     0.75
  • Related