Pandas setting a value depending on date ranges on another dataframe-CodePudding

I have some table with discount rates depending on the agent and its time period, and I would like to apply it on another table to get the current applicable rate on their date of sale.

This is the rate table (df_r)

Agentname   ProductType     OldRate NewRate StartDate   EndDate
0   VSFAAL      SPORTS       0.0    10.0    2020-11-05  2021-01-18
1   VSFAAL      APPAREL      0.0    35.0    2020-11-05  2022-05-03
2   VSFAAL      SPORTS      10.0    15.0    2021-01-18  2022-05-03
3   VSFAALJS    SPORTS       0.0    10.0    2020-11-07  2022-05-03
4   VSFAALJS    APPAREL      0.0    15.0    2020-11-07  2021-11-09
5   VSFAALJS    APPAREL     15.0     5.0    2021-11-09  2022-05-03

And this is the transactions table (df)

                  Date      Sales   Agentname   ProductType     
0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS       
1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL      
2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS       
3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS       
4 2020-05-01 08:00:51        160.0  VSFAALJS    APPAREL      
5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL

I was hoping to have the results like this:

                  Date      Sales   Agentname   ProductType     Agentname_rates
0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS             10.0
1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL            35.0
2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS             15.0
3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS             10.0
4 2020-05-01 08:00:51        160.0  VSFAALJS    APPAREL               0
5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL            15.0

Currently what I am doing is looping over product type, then agents then per index of the dates

col='Agentname'
for product in list(df.ProductType.unique()):
        for uname in list(df[col].unique()):
            a = df_r.loc[(df_r['Agentname'] == uname) & (df_r['ProductType'] == product.upper()) &
                         (df_r['EndDate'] >= df['Date'].min())]

            for i in a.index:
                     df.loc[(df['ProductType'].str.upper() == product.upper()) & (df[col] == uname) & (
                            df['Date'] >= a.loc[i]['StartDate']) & (df['Date'] <= a.loc[i]['EndDate']),
                           [f"{col}_rates"]] = a.loc[i]['NewRate']

Is there a more efficient way of doing this?

CodePudding user response：

Here is one way to do it

Merge the two DF on product and agentname, and then filter based on the dates

df3=df2.merge(df[['StartDate', 'EndDate','NewRate']], 
         left_on =[df2['Agentname'], df2['ProductType']],
         right_on=[df['Agentname'],  df['ProductType']],
              how='left',
          suffixes=('','_start')
        ).drop(columns=['key_0', 'key_1' ])

df3[df3['Date'].astype('datetime64').dt.strftime('%Y-%m-%d').between(
                                      df3['StartDate'].astype('datetime64'),
                                      df3['EndDate'].astype('datetime64'))
   ]

    Date    Sales   Agentname   ProductType StartDate   EndDate NewRate
0   2020-12-01 08:00:02 100.0   VSFAAL  SPORTS  2020-11-05  2021-01-18  10.0
2   2022-03-01 08:00:09 99.0    VSFAAL  APPAREL 2020-11-05  2022-05-03  35.0
4   2022-03-01 08:00:14 75.0    VSFAAL  SPORTS  2021-01-18  2022-05-03  15.0
5   2021-05-01 08:00:39 67.0    VSFAALJS    SPORTS  2020-11-07  2022-05-03  10.0
8   2021-05-01 08:00:56 65.0    VSFAALJS    APPAREL 2020-11-07  2021-11-09  15.0

CodePudding user response：

You can try to create a separate function to check the rates, and specify the conditions in the function

import numpy as np

def check_rates(Date, Agentname, ProductType):
    val = df_r['NewRate'].loc[(df_r['ProductType']==ProductType) & (df_r['Agentname']==Agentname) & (df_r['StartDate']<Date) & (df_r['EndDate']>Date)]
    try:
        return list(val)[0]
    except:
        return np.nan    #not found

df['Agentname_rates'] = df.apply(lambda x: check_rates(x['Date'], x['Agentname'], x['ProductType']), axis=1)
print(df)

Output:

                 Date  Sales Agentname ProductType Agentname_rates
0 2020-12-01 08:00:02  100.0    VSFAAL      SPORTS            10.0
1 2022-03-01 08:00:09   99.0    VSFAAL     APPAREL            35.0
2 2022-03-01 08:00:14   75.0    VSFAAL      SPORTS            15.0
3 2021-05-01 08:00:39   67.0  VSFAALJS      SPORTS            10.0
4 2020-05-01 08:00:51  160.0  VSFAALJS     APPAREL             NaN
5 2021-05-01 08:00:56   65.0  VSFAALJS     APPAREL            15.0