I have two dataframes: one containing replenishment orders for some products, and one containing sales data for the same products by month over multiple years. I have only included the entries for one specific product here. I have already used groupby to calculate the average sales per month per product from the sales data (grouper
):
grouper = sales.groupby(['Country Code', 'Product', 'Product Description', 'Month'])['Sales Quantity [QTY]'].mean()
Returns:
index Country Code Product Product Description Month Sales Quantity [QTY]
1 Belgium BE3194 GEL DOUCHE 500ML 1 3.000000
2 Belgium BE3194 GEL DOUCHE 500ML 2 1.750000
3 Belgium BE3194 GEL DOUCHE 500ML 3 2.333333
4 Belgium BE3194 GEL DOUCHE 500ML 4 2.000000
5 Belgium BE3194 GEL DOUCHE 500ML 5 2.000000
6 Belgium BE3194 GEL DOUCHE 500ML 6 1.500000
7 Belgium BE3194 GEL DOUCHE 500ML 7 5.000000
8 Belgium BE3194 GEL DOUCHE 500ML 8 1.750000
9 Belgium BE3194 GEL DOUCHE 500ML 9 1.500000
10 Belgium BE3194 GEL DOUCHE 500ML 10 1.500000
11 Belgium BE3194 GEL DOUCHE 500ML 11 5.500000
12 Belgium BE3194 GEL DOUCHE 500ML 12 1.500000
Then I have a second dataframe, products
, containing all replenishment orders.
Product Date Quantity
0 BE3194 2020-09-01 600
1 BE3194 2021-06-01 400
I would like to append all monthly sales from grouper
to products
for the months between the first and last date in products
. The output should look something like this:
Product Date Quantity
0 BE3194 2020-09-01 60
1 BE3194 2021-03-01 40
2 BE3194 2020-09-01 -1.5
3 BE3194 2020-10-01 -1.5
4 BE3194 2020-11-01 -5.5
5 BE3194 2020-12-01 -1.5
6 BE3194 2021-01-01 -3
7 BE3194 2021-02-01 -1.75
8 BE3194 2021-03-01 -2.33
My first idea was using a for loop combined with pd.date_range, something like this:
for entry in pd.date_range(product['Date'].min(),product['Date'].max(), freq='MS').tolist():
products.append(grouper[(grouper['Product'] == products['Product']) &
(grouper['Month'] == entry.dt.month)])
But I haven't managed to make it work so far. How can I best achieve this?
CodePudding user response:
Try:
# if products["Date"] isn't already converted, convert it:
products["Date"] = pd.to_datetime(products["Date"])
min_date = products["Date"].min()
max_date = products["Date"].max()
tmp = pd.DataFrame({"Date": pd.date_range(min_date, max_date, freq="MS")})
tmp["Month"] = tmp.Date.dt.month
tmp = tmp[:12] # <--- there are only 12 Months max in grouper df
x = pd.merge(tmp, grouper, on="Month")
x = x[x["Date"] <= max_date]
x["Quantity"] = x["Sales Quantity [QTY]"] * -1
out = pd.concat([products, x[products.columns]])
print(out)
Prints:
Product Date Quantity
0 BE3194 2020-09-01 600.000000
1 BE3194 2021-06-01 400.000000
0 BE3194 2020-09-01 -1.500000
1 BE3194 2020-10-01 -1.500000
2 BE3194 2020-11-01 -5.500000
3 BE3194 2020-12-01 -1.500000
4 BE3194 2021-01-01 -3.000000
5 BE3194 2021-02-01 -1.750000
6 BE3194 2021-03-01 -2.333333
7 BE3194 2021-04-01 -2.000000
8 BE3194 2021-05-01 -2.000000
9 BE3194 2021-06-01 -1.500000
Note: The first two values in your output df should be 600
and 400
respectively. Also, the end date should be 2021-06-01
not 2021-03-01
EDIT: For multiple products:
# if products["Date"] isn't already converted, convert it:
products["Date"] = pd.to_datetime(products["Date"])
def fn(grouper):
p = products[products.Product == grouper.name]
min_date = p["Date"].min()
max_date = p["Date"].max()
tmp = pd.DataFrame({"Date": pd.date_range(min_date, max_date, freq="MS")})
tmp["Month"] = tmp.Date.dt.month
tmp = pd.merge(tmp[:12], grouper, on="Month")
tmp = tmp[tmp["Date"] <= max_date]
tmp["Quantity"] = tmp["Sales Quantity [QTY]"] * -1
return pd.concat([p, tmp[p.columns]])
out = grouper.groupby("Product").apply(fn)
print(out)
Prints:
Product Date Quantity
Product
BE3194 0 BE3194 2020-09-01 600.000000
1 BE3194 2021-06-01 400.000000
0 BE3194 2020-09-01 -1.500000
1 BE3194 2020-10-01 -1.500000
2 BE3194 2020-11-01 -5.500000
3 BE3194 2020-12-01 -1.500000
4 BE3194 2021-01-01 -3.000000
5 BE3194 2021-02-01 -1.750000
6 BE3194 2021-03-01 -2.333333
7 BE3194 2021-04-01 -2.000000
8 BE3194 2021-05-01 -2.000000
9 BE3194 2021-06-01 -1.500000
XXXXXX 2 XXXXXX 2020-05-01 600.000000
3 XXXXXX 2021-02-01 400.000000
0 XXXXXX 2020-05-01 -2.000000
1 XXXXXX 2020-06-01 -1.500000
2 XXXXXX 2020-07-01 -5.000000
3 XXXXXX 2020-08-01 -1.750000
4 XXXXXX 2020-09-01 -1.500000
5 XXXXXX 2020-10-01 -1.500000
6 XXXXXX 2020-11-01 -5.500000
7 XXXXXX 2020-12-01 -1.500000
8 XXXXXX 2021-01-01 -3.000000
9 XXXXXX 2021-02-01 -1.750000
Dataframes used:
Product Date Quantity
0 BE3194 2020-09-01 600
1 BE3194 2021-06-01 400
2 XXXXXX 2020-05-01 600
3 XXXXXX 2021-02-01 400
Country Code Product Product Description Month Sales Quantity [QTY]
0 Belgium BE3194 GEL DOUCHE 500ML 1 3.000000
1 Belgium BE3194 GEL DOUCHE 500ML 2 1.750000
2 Belgium BE3194 GEL DOUCHE 500ML 3 2.333333
3 Belgium BE3194 GEL DOUCHE 500ML 4 2.000000
4 Belgium BE3194 GEL DOUCHE 500ML 5 2.000000
5 Belgium BE3194 GEL DOUCHE 500ML 6 1.500000
6 Belgium BE3194 GEL DOUCHE 500ML 7 5.000000
7 Belgium BE3194 GEL DOUCHE 500ML 8 1.750000
8 Belgium BE3194 GEL DOUCHE 500ML 9 1.500000
9 Belgium BE3194 GEL DOUCHE 500ML 10 1.500000
10 Belgium BE3194 GEL DOUCHE 500ML 11 5.500000
11 Belgium BE3194 GEL DOUCHE 500ML 12 1.500000
12 Belgium XXXXXX GEL DOUCHE 500ML 1 3.000000
13 Belgium XXXXXX GEL DOUCHE 500ML 2 1.750000
14 Belgium XXXXXX GEL DOUCHE 500ML 3 2.333333
15 Belgium XXXXXX GEL DOUCHE 500ML 4 2.000000
16 Belgium XXXXXX GEL DOUCHE 500ML 5 2.000000
17 Belgium XXXXXX GEL DOUCHE 500ML 6 1.500000
18 Belgium XXXXXX GEL DOUCHE 500ML 7 5.000000
19 Belgium XXXXXX GEL DOUCHE 500ML 8 1.750000
20 Belgium XXXXXX GEL DOUCHE 500ML 9 1.500000
21 Belgium XXXXXX GEL DOUCHE 500ML 10 1.500000
22 Belgium XXXXXX GEL DOUCHE 500ML 11 5.500000
23 Belgium XXXXXX GEL DOUCHE 500ML 12 1.500000