Home > OS >  appending monthly data to dataframe between two dates (multiple entries)
appending monthly data to dataframe between two dates (multiple entries)

Time:09-08

I have two dataframes: one containing replenishment orders for some products, and one containing sales data for the same products by month over multiple years. I have only included the entries for one specific product here. I have already used groupby to calculate the average sales per month per product from the sales data (grouper):

grouper = sales.groupby(['Country Code', 'Product', 'Product Description', 'Month'])['Sales Quantity [QTY]'].mean()

Returns:

index   Country Code   Product  Product Description   Month   Sales Quantity [QTY]
1       Belgium        BE3194   GEL DOUCHE 500ML      1       3.000000
2       Belgium        BE3194   GEL DOUCHE 500ML      2       1.750000
3       Belgium        BE3194   GEL DOUCHE 500ML      3       2.333333
4       Belgium        BE3194   GEL DOUCHE 500ML      4       2.000000
5       Belgium        BE3194   GEL DOUCHE 500ML      5       2.000000
6       Belgium        BE3194   GEL DOUCHE 500ML      6       1.500000
7       Belgium        BE3194   GEL DOUCHE 500ML      7       5.000000
8       Belgium        BE3194   GEL DOUCHE 500ML      8       1.750000
9       Belgium        BE3194   GEL DOUCHE 500ML      9       1.500000
10      Belgium        BE3194   GEL DOUCHE 500ML      10      1.500000
11      Belgium        BE3194   GEL DOUCHE 500ML      11      5.500000
12      Belgium        BE3194   GEL DOUCHE 500ML      12      1.500000

Then I have a second dataframe, products, containing all replenishment orders.

    Product   Date          Quantity
0   BE3194    2020-09-01    600
1   BE3194    2021-06-01    400

I would like to append all monthly sales from grouper to products for the months between the first and last date in products. The output should look something like this:

    Product   Date          Quantity
0   BE3194    2020-09-01    60
1   BE3194    2021-03-01    40
2   BE3194    2020-09-01    -1.5
3   BE3194    2020-10-01    -1.5
4   BE3194    2020-11-01    -5.5
5   BE3194    2020-12-01    -1.5
6   BE3194    2021-01-01    -3
7   BE3194    2021-02-01    -1.75
8   BE3194    2021-03-01    -2.33

My first idea was using a for loop combined with pd.date_range, something like this:

for entry in pd.date_range(product['Date'].min(),product['Date'].max(), freq='MS').tolist():
          products.append(grouper[(grouper['Product'] == products['Product']) & 
          (grouper['Month'] == entry.dt.month)])

But I haven't managed to make it work so far. How can I best achieve this?

CodePudding user response:

Try:

# if products["Date"] isn't already converted, convert it:
products["Date"] = pd.to_datetime(products["Date"])

min_date = products["Date"].min()
max_date = products["Date"].max()

tmp = pd.DataFrame({"Date": pd.date_range(min_date, max_date, freq="MS")})
tmp["Month"] = tmp.Date.dt.month
tmp = tmp[:12]  # <--- there are only 12 Months max in grouper df

x = pd.merge(tmp, grouper, on="Month")
x = x[x["Date"] <= max_date]
x["Quantity"] = x["Sales Quantity [QTY]"] * -1

out = pd.concat([products, x[products.columns]])
print(out)

Prints:

  Product       Date    Quantity
0  BE3194 2020-09-01  600.000000
1  BE3194 2021-06-01  400.000000
0  BE3194 2020-09-01   -1.500000
1  BE3194 2020-10-01   -1.500000
2  BE3194 2020-11-01   -5.500000
3  BE3194 2020-12-01   -1.500000
4  BE3194 2021-01-01   -3.000000
5  BE3194 2021-02-01   -1.750000
6  BE3194 2021-03-01   -2.333333
7  BE3194 2021-04-01   -2.000000
8  BE3194 2021-05-01   -2.000000
9  BE3194 2021-06-01   -1.500000

Note: The first two values in your output df should be 600 and 400 respectively. Also, the end date should be 2021-06-01 not 2021-03-01


EDIT: For multiple products:

# if products["Date"] isn't already converted, convert it:
products["Date"] = pd.to_datetime(products["Date"])


def fn(grouper):
    p = products[products.Product == grouper.name]

    min_date = p["Date"].min()
    max_date = p["Date"].max()

    tmp = pd.DataFrame({"Date": pd.date_range(min_date, max_date, freq="MS")})
    tmp["Month"] = tmp.Date.dt.month

    tmp = pd.merge(tmp[:12], grouper, on="Month")
    tmp = tmp[tmp["Date"] <= max_date]
    tmp["Quantity"] = tmp["Sales Quantity [QTY]"] * -1

    return pd.concat([p, tmp[p.columns]])


out = grouper.groupby("Product").apply(fn)
print(out)

Prints:

          Product       Date    Quantity
Product                                 
BE3194  0  BE3194 2020-09-01  600.000000
        1  BE3194 2021-06-01  400.000000
        0  BE3194 2020-09-01   -1.500000
        1  BE3194 2020-10-01   -1.500000
        2  BE3194 2020-11-01   -5.500000
        3  BE3194 2020-12-01   -1.500000
        4  BE3194 2021-01-01   -3.000000
        5  BE3194 2021-02-01   -1.750000
        6  BE3194 2021-03-01   -2.333333
        7  BE3194 2021-04-01   -2.000000
        8  BE3194 2021-05-01   -2.000000
        9  BE3194 2021-06-01   -1.500000
XXXXXX  2  XXXXXX 2020-05-01  600.000000
        3  XXXXXX 2021-02-01  400.000000
        0  XXXXXX 2020-05-01   -2.000000
        1  XXXXXX 2020-06-01   -1.500000
        2  XXXXXX 2020-07-01   -5.000000
        3  XXXXXX 2020-08-01   -1.750000
        4  XXXXXX 2020-09-01   -1.500000
        5  XXXXXX 2020-10-01   -1.500000
        6  XXXXXX 2020-11-01   -5.500000
        7  XXXXXX 2020-12-01   -1.500000
        8  XXXXXX 2021-01-01   -3.000000
        9  XXXXXX 2021-02-01   -1.750000

Dataframes used:

  Product       Date  Quantity
0  BE3194 2020-09-01       600
1  BE3194 2021-06-01       400
2  XXXXXX 2020-05-01       600
3  XXXXXX 2021-02-01       400


   Country Code Product Product Description  Month  Sales Quantity [QTY]
0       Belgium  BE3194    GEL DOUCHE 500ML      1              3.000000
1       Belgium  BE3194    GEL DOUCHE 500ML      2              1.750000
2       Belgium  BE3194    GEL DOUCHE 500ML      3              2.333333
3       Belgium  BE3194    GEL DOUCHE 500ML      4              2.000000
4       Belgium  BE3194    GEL DOUCHE 500ML      5              2.000000
5       Belgium  BE3194    GEL DOUCHE 500ML      6              1.500000
6       Belgium  BE3194    GEL DOUCHE 500ML      7              5.000000
7       Belgium  BE3194    GEL DOUCHE 500ML      8              1.750000
8       Belgium  BE3194    GEL DOUCHE 500ML      9              1.500000
9       Belgium  BE3194    GEL DOUCHE 500ML     10              1.500000
10      Belgium  BE3194    GEL DOUCHE 500ML     11              5.500000
11      Belgium  BE3194    GEL DOUCHE 500ML     12              1.500000
12      Belgium  XXXXXX    GEL DOUCHE 500ML      1              3.000000
13      Belgium  XXXXXX    GEL DOUCHE 500ML      2              1.750000
14      Belgium  XXXXXX    GEL DOUCHE 500ML      3              2.333333
15      Belgium  XXXXXX    GEL DOUCHE 500ML      4              2.000000
16      Belgium  XXXXXX    GEL DOUCHE 500ML      5              2.000000
17      Belgium  XXXXXX    GEL DOUCHE 500ML      6              1.500000
18      Belgium  XXXXXX    GEL DOUCHE 500ML      7              5.000000
19      Belgium  XXXXXX    GEL DOUCHE 500ML      8              1.750000
20      Belgium  XXXXXX    GEL DOUCHE 500ML      9              1.500000
21      Belgium  XXXXXX    GEL DOUCHE 500ML     10              1.500000
22      Belgium  XXXXXX    GEL DOUCHE 500ML     11              5.500000
23      Belgium  XXXXXX    GEL DOUCHE 500ML     12              1.500000
  • Related