I am merging a column from one dataframe with a larger one based on date column. With this code:
df_final = pd.merge(df_final, pmms_df, how='left', on='PredictionDate')
pmms_df
looks like this:
PredictionDate U.S. 30 yr FRM U.S. 15 yr FRM
0 2014-12-31 3.87 3.15
1 2015-01-01 3.87 3.15
2 2015-01-02 3.87 3.15
3 2015-01-03 3.87 3.15
4 2015-01-04 3.87 3.15
... ... ... ...
2769 2022-07-31 5.30 4.58
2770 2022-08-01 4.99 4.26
2771 2022-08-02 4.99 4.26
2772 2022-08-03 4.99 4.26
2773 2022-08-04 4.99 4.26
and df_final
is a huge df with 20,000 rows and 61 columns, so I am only including the relevant output columns here post-merge:
PredictionDate U.S. 30 yr FRM U.S. 15 yr FRM
0 2022-03-09 3.85 3.09
1 2022-04-11 5.00 4.17
2 2022-05-10 5.30 4.48
3 2022-06-09 5.23 4.38
4 2021-04-09 3.13 2.42
... ... ... ...
20528 2022-01-11 3.45 2.62
20529 2022-02-09 3.69 2.93
20530 2022-03-09 3.85 3.09
20531 2022-04-11 5.00 4.17
20532 2022-05-10 5.30 4.48
The dataframe I'm merging with has rows with only one day per month so the merge finds that day's row in the first dataframe and merges the U.S. 30 and 15 yr FRM data for that day into a new column in the other dataframe. However, I would like to add an additional column in the other dataframe for both 30 and 15 yr FRM that is based on the data in this dataframe but from 30 days earlier. Desired output would look like something like this:
PredictionDate U.S. 30 yr FRM U.S. 15 yr FRM 30yrLag 15yrLag
0 2022-03-09 3.85 3.09 3.72 3.12
1 2022-04-11 5.00 4.17 5.05 4.15
2 2022-05-10 5.30 4.48 5.32 4.58
3 2022-06-09 5.23 4.38 . .
4 2021-04-09 3.13 2.42 . .
... ... ... ...
20528 2022-01-11 3.45 2.62 . .
20529 2022-02-09 3.69 2.93 . .
20530 2022-03-09 3.85 3.09 . .
20531 2022-04-11 5.00 4.17 . .
20532 2022-05-10 5.30 4.48 . .
So the idea is that those last two columns would contain the 30yr and 15yr data of 30 days prior in pmms_df
to the day it was merged on. The values I included here for 30yrLag
and 15yrlag
are supposed to be the values for those columns from 30 days before the date in PredictedDate
in the final dataframe.
CodePudding user response:
Solution here.
Needed to do the lag first, then merge, instead of doing it simultaneously.