Date dependent calculation from 2 dataframes - average 6-month return-CodePudding

I am working with the following dataframe, I have data for multiple companies, each row associated with a specific datadate, so I have many rows related to many companies - with ipo date from 2009 to 2022.

index ID price  daily_return  datadate   daily_market_return    mean_daily_market_return     ipodate
0     1   27.50   0.008         01-09-2010       0.0023                   0.03345           01-12-2009
1     2   33.75   0.0745        05-02-2017       0.00458                  0.0895            06-12-2012 
2     3   29,20   0.00006       08-06-2020       0.0582                   0.0045            01-05-2013
3     4   20.54   0.00486       09-06-2018       0.0009                   0.0006            27-11-2013
4     1   21.50   0.009         02-09-2021       0.0846                   0.04345           04-05-2009
5     4   22.75   0.00539       06-12-2019       0.0003                   0.0006            21-09-2012
...
26074 rows

I also have a dataframe containing the Market yield on US Treasury securities at 10-year constant maturity - measured daily. Each row represents the return associated with a specific day, each day from 2009 to 2022.

     date          dgs10
1   2009-01-02      2.46
2   2009-01-05      2.49
3   2009-01-06      2.51
4   2009-01-07      2.52
5   2009-01-08      2.47
6   2009-01-09      2.43
7   2009-01-12      2.34
8   2009-01-13      2.33
...
      date         dgs10
3570    2022-09-08  3.29
3571    2022-09-09  3.33
3572    2022-09-12  3.37
3573    2022-09-13  3.42
3574    2022-09-14  3.41

My goal is to calculate, for each ipodate (from dataframe 1), the average of the previous 6-month return of the the Market yield on US Treasury securities at 10-year constant maturity (from dataframe 2). The result should either be in a new dataframe or in an additionnal column in dataframe 1. Both dataframes are not the same length. I tried using rolling(), but it doesn't seem to be working. Anyone knows how to fix this?

CodePudding user response：

# Make sure that all date columns are of type Timestamp. They are a lot easier
# to work with
df1["ipodate"] = pd.to_datetime(df1["ipodate"], dayfirst=True)
df2["date"] = pd.to_datetime(df2["date"])

# Calculate the mean market yield of the previous 6 months. Six month is not a
# fixed length of time so I replaced it with 180 days.
tmp = df2.rolling("180D", on="date").mean()

# The values of the first 180 days are invalid, because we have insufficient
# data to calculate the rolling mean. You may consider extending df2 further
# back to 2008. (You may come up with other rules for this period.)
is_invalid = (tmp["date"] - tmp["date"].min()) / pd.Timedelta(1, "D") < 180
tmp.loc[is_invalid, "dgs10"] = np.nan

# Result
df1.merge(tmp, left_on="ipodate", right_on="date", how="left")