Sales per month per product id-CodePudding

I want to have a list of all product sales from 2020-01-01 until 2020-12-01 for sales of each id_main_product (1-100). Would it be better to first create some product groups (e.g. id_main_product from 1-10,11-20 etc.) as I think that the output would be a big table ?

I have already tried out some things with groupby etc. but I can´t find the solution.

This is the dataset:

casesalesdata = pd.read_csv("https://raw.githubusercontent.com/fredzett/Notebooks/master/01_case_sales_data.csv")

CodePudding user response：

First, you should change the data type of date column to datetime and format as %Y-%m to use later in pivot. Then, you can use pd.pivot.

casesalesdata.date=pd.to_datetime(casesalesdata.date)
casesalesdata.date=casesalesdata.date.dt.strftime('%Y-%m')
pd.pivot_table(casesalesdata,index=["date","id_main_product"],
               values=["items"],
               aggfunc='sum')

You may also refer to Group by Year and Month Panda Pivot Table

CodePudding user response：

it's not clear the analysis you wish to complete
first step is filter to rows you want
you can then generate summaries with groupby() and agg()

casesalesdata = pd.read_csv(
    "https://raw.githubusercontent.com/fredzett/Notebooks/master/01_case_sales_data.csv"
)

casesalesdata["date"] = pd.to_datetime(casesalesdata["date"])
# just dates and wanted products
df = casesalesdata.loc[
    casesalesdata["date"].between("2020-01-01", "2020-12-01")
    & casesalesdata["id_main_product"].between(1, 100)
]

df.groupby([df["date"].dt.month, "id_main_product"]).agg(
    {
        "items": "sum",
        "margin_perc": "mean",
        "avg_price_per_item": "mean",
        "rating": "mean",
    }
)