I have a dataframe made up of dummy car purchases across a year which looks like:
df =
purchase_date brand
2021-02-13 BMW
2021-02-28 BMW
2021-03-10 Audi
2021-03-11 BMW
...
What I'm looking for is to get an aggregated count of each brand of car for each month in 2021, so it would look like this:
df =
BMW Audi
(2021-02) 2 0
(2021-03) 1 1
...
So far I've used this code to group by the year, month but I can't split it to count individual brands:
df = df.groupby([df['purchase_date'].dt.year.rename('year'), df3['purchase_date'].dt.month.rename('month')]).agg({'count'})
This returns:
('brand','count')
(2021-02) 2
(2021-03) 2
CodePudding user response:
Use crosstab
with month periods:
df1 = pd.crosstab(df['purchase_date'].dt.to_period('m').rename('year'), df['brand'])
print (df1)
brand Audi BMW
year
2021-02 0 2
2021-03 1 1
Your solution is with add column brand
, aggregate GroupBy.size
and Series.unstack
:
df2 = (df.groupby([df['purchase_date'].dt.year.rename('year'),
df['purchase_date'].dt.month.rename('month'), 'brand'])
.size()
.unstack(fill_value=0))
print (df2)
brand Audi BMW
year month
2021 2 0 2
3 1 1
Alternative:
df3 = (df.groupby([pd.Grouper(freq='MS',key='purchase_date'), 'brand'])
.size()
.unstack(fill_value=0))
print (df3)
brand Audi BMW
purchase_date
2021-02-01 0 2
2021-03-01 1 1