I have a dataframe that looks like this: [1]: https://i.stack.imgur.com/KnHba.png
Essentially, there is a distributor name column, a sales column, and a MM/DD/YYYY Date column.
For each distributor, by month, I want the sum of sales. What I tried: df = df.groupby(df['Distributor Name'],df.Date.dt.month)['Sales'].sum()
This throws an error. "Unhashable type: Series". This works when I remove Distributor Name, but I don't just want the overall monthly sales. I want the monthly sales BY distributor.
Thanks in advance!! Joel
CodePudding user response:
The correct way to group by multiple columns is to put them in a list as a first argument:
result = df.groupby(['Distributor Name', df.date.dt.month])['Sales'].sum()
This creates a multiindex pandas series, with Distributor Name
and Date
as indices. If you wish to create a dataframe with three columns (Distributor Name
, Date
, Sales
) you can reset index of this pandas series.
result = result.reset_index()