Home > Software engineering >  Merging dataframes (with different date time --monthly vs daily) at the same time applying lagged va
Merging dataframes (with different date time --monthly vs daily) at the same time applying lagged va

Time:06-30

I have 2 dataframes I wish to merge:

df1 looks like this:

Date          Col1  Col 2  Col 3    Col 4   
    
2016-03     27.57   0.93    28.7    1.57    
2016-04     25.83   0.23    28.34   0.84    
2016-05     24.55   0.27    27.11   0.03    

df2 looks like this:

Date          ColA            

2016-03-21  7.640769230769231   
2016-03-22  7.739720279720279   
2016-03-23  7.577311827956988   
2016-03-24  7.745416666666666   

As you can see, df1 is a monthly data and df2 is a daily data. However, I want to merge them in a daily format (following df2) but I also want df1 to be lagged (lag = -30)

This is my desired output:

Output:

   Date          ColA              Col1  Col 2  Col 3    Col 4     
      
    2016-03-21  7.640769230769231   25.83   0.23    28.34   0.84
    2016-03-22  7.739720279720279   25.83   0.23    28.34   0.84
    2016-03-23  7.577311827956988   25.83   0.23    28.34   0.84
    2016-03-24  7.745416666666666   25.83   0.23    28.34   0.84


....2016-04-01  xxxxxxxx            24.55   0.27    27.11   0.03

I tried this but, they just merge and the lags were not applied.

out = (df2.merge(df1.shift(-30), on='Date').axis=1)

CodePudding user response:

IIUC, you can use:

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

out = (df2.merge(df1.assign(Date=df1['Date'].sub(pd.DateOffset(months=1))
                                            .dt.to_period('M')),
                 left_on=df2['Date'].dt.to_period('M'), right_on='Date',
                 how='left', suffixes=(None, '_df1'))
      ) 

or:

out = (df2.merge(df1.assign(Date=df1['Date'].dt.to_period('M').sub(1)),
                 left_on=df2['Date'].dt.to_period('M'), right_on='Date',
                 how='left', suffixes=(None, '_df1'))
      )

output:

        Date      ColA Date_df1   Col1  Col 2  Col 3  Col 4
0 2016-03-21  7.640769  2016-03  25.83   0.23  28.34   0.84
1 2016-03-22  7.739720  2016-03  25.83   0.23  28.34   0.84
2 2016-03-23  7.577312  2016-03  25.83   0.23  28.34   0.84
3 2016-03-24  7.745417  2016-03  25.83   0.23  28.34   0.84

CodePudding user response:

If lag 30 means previous month you can create month periods with subtract 1 and then for merge use merge_asof:

df1['Date'] = pd.to_datetime(df1['Date']).dt.to_period('M').sub(1).dt.to_timestamp()
df2['Date'] = pd.to_datetime(df2['Date'])


df = pd.merge_asof(df2, df1, on='Date')
print (df)
        Date      ColA   Col1  Col 2  Col 3  Col 4
0 2016-03-21  7.640769  25.83   0.23  28.34   0.84
1 2016-03-22  7.739720  25.83   0.23  28.34   0.84
2 2016-03-23  7.577312  25.83   0.23  28.34   0.84
3 2016-03-24  7.745417  25.83   0.23  28.34   0.84
  • Related