I have two dataframes df1 and df2 df1 contains month and two date columns
Df1
Month Month_Start Month_End
Month1 2022-03-27 2022-04-30
Month2 2022-05-01 2022-05-28
Month3 2022-05-01 2022-06-25
another data frame df2
start_Month end_Month price
2022-03-27 2260-12-31 1
2022-03-27 2260-12-31 2
2022-03-27 2260-12-31 3
if Month_Start
and Month_end
of df1 is lies in between start_Month
and end_Month
of df2 assign price
column value to Month
column of df1
like following result
Month price
Month1 1
Month2 1
Month3 1
I tried using for loops
for i in range(len(df2)):
for j in range(len(df1)):
if df2['start_Month'][i] <= df1['Month_Start'][j]<= df1['Month_End'][j] <= df2['end_Month'][i]:
new.loc[len(new.index)] = [df1['month'][j], df2['price'][i]]
but taking lot of time for execution for 1000 rows.
ANY IDEAS?
CodePudding user response:
Is there a common column where you can combine these two dataframes? such as id. If there is, it would be much more accurate to apply the conditions after combining these two tables. You can try the code below based on current data and conditions (Dataframes that are not the same size may have a problem.).
import pandas as pd
import numpy as np
df1=pd.DataFrame(data={'Month':['Month1','Month2','Month3'],
'Month_Start':['2022-03-27','2022-05-01','2022-05-01'],
'Month_End':['2022-04-30','2022-05-28','2022-06-25']})
df2=pd.DataFrame(data={'start_Month':['2022-03-27','2022-03-27','2022-03-27'],
'end_Month':['2260-12-31','2260-12-31','2260-12-31'],
'price':[1,2,3]})
con=[(df1['Month_Start']>= df2['start_Month']) & (df1['Month_End']<= df2['end_Month'])]
cho=[df2['price']]
df1['price']=np.select(con,cho,default=np.nan)#
CodePudding user response:
Assuming your dataframes are these:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({ 'Month': ['Month1', 'Month2', 'Month3'],
'Month_Start': ['2022-03-27', '2022-05-01', '2022-05-01'],
'Month_End': ['2022-04-30', '2022-05-28', '2022-06-25'] })
df1['Month_Start'] = pd.to_datetime(df1['Month_Start'])
df1['Month_End'] = pd.to_datetime(df1['Month_End'])
print(df1)
Month Month_Start Month_End
0 Month1 2022-03-27 2022-04-30
1 Month2 2022-05-01 2022-05-28
2 Month3 2022-05-01 2022-06-25
df2 = pd.DataFrame({ 'start_Month': ['2022-03-01', '2022-05-01', '2022-06-01'],
'end_Month': ['2022-04-30', '2022-05-30', '2022-06-30'],
'price': [1, 2, 3] })
df2['start_Month'] = pd.to_datetime(df2['start_Month'])
df2['end_Month'] = pd.to_datetime(df2['end_Month'])
print(df2) #note validity periods do not overlap, so only 1 price is valid!
start_Month end_Month price
0 2022-03-01 2022-04-30 1
1 2022-05-01 2022-05-30 2
2 2022-06-01 2022-06-30 3
I would define an external function to check the validity period, then return the corresponding price. Note that if there are 2 corresponding periods, the first one will be returned. If no corresponding period is found, a null value is returned.
def check_validity(row):
try:
return (df2['price'][(df2['start_Month']<=row['Month_Start']) & (row['Month_End']<=df2['end_Month'])].values[0])
except:
return np.nan
df1['price'] = df1.apply(lambda x: check_validity(x), axis=1)
print(df1)
Output:
Month Month_Start Month_End price
0 Month1 2022-03-27 2022-04-30 1.0
1 Month2 2022-05-01 2022-05-28 2.0
2 Month3 2022-05-01 2022-06-25 NaN