Home > Back-end >  Add column in dataframe from another dataframe matching the id and based on condition in date column
Add column in dataframe from another dataframe matching the id and based on condition in date column

Time:10-23

My problem is a very complex and confusing one, I haven't been able to find the answer anywhere. I basically have 2 dataframes, one is price history of certain products and the other is invoice dataframe that contains transaction data.

Sample Data:

Price History:

    product_id    updated  price
id                              
1            1 2022-01-01    5.0
2            2 2022-01-01    5.5
3            3 2022-01-01    5.7
4            1 2022-01-15    6.0
5            2 2022-01-15    6.5
6            3 2022-01-15    6.7
7            1 2022-02-01    7.0
8            2 2022-02-01    7.5
9            3 2022-02-01    7.7

Invoice:

   transaction_date  product_id  quantity
id                                       
1        2022-01-02           1         2
2        2022-01-02           2         3
3        2022-01-02           3         4
4        2022-01-14           1         1
5        2022-01-14           2         4
6        2022-01-14           3         2
7        2022-01-15           1         3
8        2022-01-15           2         6
9        2022-01-15           3         5
10       2022-01-16           1         3
11       2022-01-16           2         2
12       2022-01-16           3         3
13       2022-02-05           1         1
14       2022-02-05           2         4
15       2022-02-05           3         7
16       2022-05-10           1         4
17       2022-05-10           2         2
18       2022-05-10           3         1

What I am looking to achieve is to add the price column in the Invoice dataframe, based on:

  1. The product id
  2. Comparing the Updated and Transaction Date in a way that updated date <= transaction date for that particular record, basically finding the closest date after the price was updated. (The MAX date that is <= transaction date)

I managed to do this:

invoice['price'] = invoice['product_id'].map(price_history.set_index('id')['price'])

but need to incorporate the date condition now.

Expected result for sample data:

Expected Result

Any guidance in the correct direction is appreciated, thanks

CodePudding user response:

merge_asof is what you are looking for:

pd.merge_asof(
    invoice,
    price_history,
    left_on="transaction_date",
    right_on="updated",
    by="product_id",
)[["transaction_date", "product_id", "quantity", "price"]]

CodePudding user response:

merge_asof with arg direction

merged = pd.merge_asof(
    left=invoice,
    right=product,
    left_on="transaction_date",
    right_on="updated",
    by="product_id",
    direction="backward",
    suffixes=("", "_y")
).drop(columns=["id_y", "updated"]).reset_index(drop=True)

print(merged)


    id transaction_date  product_id  quantity  price
0    1       2022-01-02           1         2    5.0
1    2       2022-01-02           2         3    5.5
2    3       2022-01-02           3         4    5.7
3    4       2022-01-14           1         1    5.0
4    5       2022-01-14           2         4    5.5
5    6       2022-01-14           3         2    5.7
6    7       2022-01-15           1         3    6.0
7    8       2022-01-15           2         6    6.5
8    9       2022-01-15           3         5    6.7
9   10       2022-01-16           1         3    6.0
10  11       2022-01-16           2         2    6.5
11  12       2022-01-16           3         3    6.7
12  13       2022-02-05           1         1    7.0
13  14       2022-02-05           2         4    7.5
14  15       2022-02-05           3         7    7.7
15  16       2022-05-10           1         4    7.0
16  17       2022-05-10           2         2    7.5
17  18       2022-05-10           3         1    7.7
  • Related