I want to have a list of all product sales from 2020-01-01 until 2020-12-01 for sales of each id_main_product (1-100). Would it be better to first create some product groups (e.g. id_main_product from 1-10,11-20 etc.) as I think that the output would be a big table ?
I have already tried out some things with groupby
etc. but I can´t find the solution.
This is the dataset:
casesalesdata = pd.read_csv("https://raw.githubusercontent.com/fredzett/Notebooks/master/01_case_sales_data.csv")
CodePudding user response:
First, you should change the data type of date
column to datetime
and format as %Y-%m
to use later in pivot. Then, you can use pd.pivot
.
casesalesdata.date=pd.to_datetime(casesalesdata.date)
casesalesdata.date=casesalesdata.date.dt.strftime('%Y-%m')
pd.pivot_table(casesalesdata,index=["date","id_main_product"],
values=["items"],
aggfunc='sum')
You may also refer to Group by Year and Month Panda Pivot Table
CodePudding user response:
- it's not clear the analysis you wish to complete
- first step is filter to rows you want
- you can then generate summaries with
groupby()
andagg()
casesalesdata = pd.read_csv(
"https://raw.githubusercontent.com/fredzett/Notebooks/master/01_case_sales_data.csv"
)
casesalesdata["date"] = pd.to_datetime(casesalesdata["date"])
# just dates and wanted products
df = casesalesdata.loc[
casesalesdata["date"].between("2020-01-01", "2020-12-01")
& casesalesdata["id_main_product"].between(1, 100)
]
df.groupby([df["date"].dt.month, "id_main_product"]).agg(
{
"items": "sum",
"margin_perc": "mean",
"avg_price_per_item": "mean",
"rating": "mean",
}
)