compare a two date columns of a data frame with another two data frames of second data frame in pyth-CodePudding

I have two dataframes df1 and df2 df1 contains month and two date columns

Df1

Month   Month_Start   Month_End
Month1      2022-03-27    2022-04-30
Month2      2022-05-01    2022-05-28
Month3      2022-05-01    2022-06-25

another data frame df2

start_Month     end_Month      price
2022-03-27      2260-12-31      1
2022-03-27      2260-12-31      2
2022-03-27      2260-12-31      3

if Month_Start and Month_end of df1 is lies in between start_Month and end_Month of df2 assign price column value to Month column of df1

like following result

Month   price
Month1      1
Month2      1
Month3      1

I tried using for loops

for i in range(len(df2)):
    for j in range(len(df1)):
        if df2['start_Month'][i] <= df1['Month_Start'][j]<= df1['Month_End'][j] <= df2['end_Month'][i]:
            new.loc[len(new.index)] = [df1['month'][j], df2['price'][i]]

but taking lot of time for execution for 1000 rows.

ANY IDEAS?

CodePudding user response：

Is there a common column where you can combine these two dataframes? such as id. If there is, it would be much more accurate to apply the conditions after combining these two tables. You can try the code below based on current data and conditions (Dataframes that are not the same size may have a problem.).

import pandas as pd
import numpy as np
df1=pd.DataFrame(data={'Month':['Month1','Month2','Month3'],
                       'Month_Start':['2022-03-27','2022-05-01','2022-05-01'],
                       'Month_End':['2022-04-30','2022-05-28','2022-06-25']})

df2=pd.DataFrame(data={'start_Month':['2022-03-27','2022-03-27','2022-03-27'],
                       'end_Month':['2260-12-31','2260-12-31','2260-12-31'],
                 'price':[1,2,3]})

con=[(df1['Month_Start']>= df2['start_Month']) & (df1['Month_End']<= df2['end_Month'])]
cho=[df2['price']]
df1['price']=np.select(con,cho,default=np.nan)#

CodePudding user response：

Assuming your dataframes are these:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({ 'Month': ['Month1', 'Month2', 'Month3'], 
                     'Month_Start': ['2022-03-27', '2022-05-01', '2022-05-01'], 
                     'Month_End': ['2022-04-30', '2022-05-28', '2022-06-25'] })
df1['Month_Start'] = pd.to_datetime(df1['Month_Start'])
df1['Month_End'] = pd.to_datetime(df1['Month_End'])
print(df1)
    Month Month_Start  Month_End
0  Month1  2022-03-27 2022-04-30
1  Month2  2022-05-01 2022-05-28
2  Month3  2022-05-01 2022-06-25

df2 = pd.DataFrame({ 'start_Month': ['2022-03-01', '2022-05-01', '2022-06-01'], 
                     'end_Month': ['2022-04-30', '2022-05-30', '2022-06-30'],
                     'price': [1, 2, 3] })
df2['start_Month'] = pd.to_datetime(df2['start_Month'])
df2['end_Month'] = pd.to_datetime(df2['end_Month'])
print(df2)    #note validity periods do not overlap, so only 1 price is valid!
  start_Month  end_Month  price
0  2022-03-01 2022-04-30      1
1  2022-05-01 2022-05-30      2
2  2022-06-01 2022-06-30      3

I would define an external function to check the validity period, then return the corresponding price. Note that if there are 2 corresponding periods, the first one will be returned. If no corresponding period is found, a null value is returned.

def check_validity(row):
    try:
        return (df2['price'][(df2['start_Month']<=row['Month_Start']) & (row['Month_End']<=df2['end_Month'])].values[0])
    except:
        return np.nan
    
df1['price'] = df1.apply(lambda x: check_validity(x), axis=1)
print(df1)

Output:

    Month Month_Start  Month_End  price
0  Month1  2022-03-27 2022-04-30    1.0
1  Month2  2022-05-01 2022-05-28    2.0
2  Month3  2022-05-01 2022-06-25    NaN