Home > Blockchain >  How to use df groupby to return counts on specific values in column across each month
How to use df groupby to return counts on specific values in column across each month

Time:04-05

I have a dataframe made up of dummy car purchases across a year which looks like:

df =

purchase_date     brand
2021-02-13        BMW
2021-02-28        BMW
2021-03-10        Audi
2021-03-11        BMW
...

What I'm looking for is to get an aggregated count of each brand of car for each month in 2021, so it would look like this:

df =

              BMW   Audi
(2021-02)     2     0 
(2021-03)     1     1
...

So far I've used this code to group by the year, month but I can't split it to count individual brands:

df = df.groupby([df['purchase_date'].dt.year.rename('year'), df3['purchase_date'].dt.month.rename('month')]).agg({'count'})

This returns:

              ('brand','count')
(2021-02)     2
(2021-03)     2

CodePudding user response:

Use crosstab with month periods:

df1 = pd.crosstab(df['purchase_date'].dt.to_period('m').rename('year'), df['brand'])
print (df1)
brand    Audi  BMW
year              
2021-02     0    2
2021-03     1    1

Your solution is with add column brand, aggregate GroupBy.size and Series.unstack:

df2 = (df.groupby([df['purchase_date'].dt.year.rename('year'), 
                 df['purchase_date'].dt.month.rename('month'), 'brand'])
        .size()
        .unstack(fill_value=0))
print (df2)
brand       Audi  BMW
year month           
2021 2         0    2
     3         1    1

Alternative:

df3 = (df.groupby([pd.Grouper(freq='MS',key='purchase_date'), 'brand'])
        .size()
        .unstack(fill_value=0))
print (df3)
brand          Audi  BMW
purchase_date           
2021-02-01        0    2
2021-03-01        1    1
  • Related