Pandas: Annualized Returns-CodePudding

I have a dataframe with quarterly returns of financial entities and I want to calculate 1, 3, 5 10-year annualized returns. The formula for calculating annualized returns is:

R = product(1 r)^(4/N) -1

r are the quarterly return of an entity, N is the number of quarters

for example 3-year annualized return is:

R_3yr = product(1 r)^(4/12) -1 = ((1 r1)*(1 r2)*(1 r3)*...*(1 r12))^(1/3) -1

r1, r2, r3 ... r12 are the quarterly returns of the previous 11 quarters plus current quarter.

I created a code which provides the right results but it is very slow because it is looping through each row of the dataframe. The code below is an extract of my code for 1-year and 3-year annualized retruns (I applied the same concept for 5, 7, 10, 15 and 20-year returns). r_qrt is the field with the quarterly returns

import pandas as pd
import numpy as np

#create dataframe where I append the results
df_final = pd.DataFrame()
columns=['Date','Entity','r_qrt','R_1yr','R_3yr']

#loop thorugh the dataframe
for row in df.itertuples():
    
    R_1yr=np.nan #1-year annualized return
    R_3yr=np.nan #3-year annualized return
    
    #Calculate 1 YR Annualized Return
    date_previous_period=row.Date  pd.DateOffset(years=-1)
    temp_table=df.loc[(df['Date']>date_previous_period) & 
                              (df['Date']<=row.Date) & 
                              (df['Entity']==row.Entity)]
    if temp_table['r_qrt'].count()>=4:
        b=(1 (temp_table.r_qrt))[-4:].product()
        R_1yr=(b-1)
    
    #Calculate 3 YR Annualized Return
    date_previous_period=row.Date  pd.DateOffset(years=-3)
    temp_table=df.loc[(df['Date']>date_previous_period) & 
                              (df['Date']<=row.Date) &
                              (df['Entity']==row.Entity)]
        
    if temp_table['r_qrt'].count()>=12:
        b=(1 (temp_table.r_qrt))[-12:].product()
        R_3yr=((b**(1/3))-1)
        
    d=[row.Date,row.Entity,row.r_qrt,R_1yr,R_3yr]
    df_final = df_final.append(pd.Series(d, index=columns), ignore_index=True)

df_final looks as below (only reporting 1-year return results for space limitations)

Date	Entity	r_qrt	R_1yr
2015-03-31	A	0.035719	NaN
2015-06-30	A	0.031417	NaN
2015-09-30	A	0.030872	NaN
2015-12-31	A	0.029147	0.133335
2016-03-31	A	0.022100	0.118432
2016-06-30	A	0.020329	0.106408
2016-09-30	A	0.017676	0.092245
2016-12-31	A	0.017304	0.079676
2015-03-31	B	0.034705	NaN
2015-06-30	B	0.037772	NaN
2015-09-30	B	0.036726	NaN
2015-12-31	B	0.031889	0.148724
2016-03-31	B	0.029567	0.143020
2016-06-30	B	0.028958	0.133312
2016-09-30	B	0.028890	0.124746
2016-12-31	B	0.030389	0.123110

I am sure there is a more efficient way to run the same calculations but I have not been able to find it. My code is not efficient and takes more than 2 hours for large dataframes with long time series and many entities.

Thanks

CodePudding user response：

see (https://www.investopedia.com/terms/a/annualized-total-return.asp) for the definition of annualized return

data=[ 3, 7, 5, 12, 1]
def annualize_rate(data):
    retVal=0
    accum=1
    for item in data:
        print(1 (item/100))
        accum*=1 (item/100)
        retVal=pow(accum,1/len(data))-1
    return retVal
    
print(annualize_rate(data))

output

0.05533402290765199

2015 (a and b)

data=[0.133335,0.148724]

print(annualize_rate(data))

output:

0.001410292043902306

2016 (a&b)

data=[0.079676,0.123110]
print(annualize_rate(data))

output

0.0010139064424810051

you can store each year annualized value then use pct_chg to get a 3 year result

data=[0.05,0.06,0.07]
df=pd.DataFrame({'Annualized':data})
df['Percent_Change']=df['Annualized'].pct_change().fillna(0)

amount=1

returns_plus_one=df['Percent_Change'] 1
cumulative_return = returns_plus_one.cumprod()

df['Cumulative']=cumulative_return.mul(amount)

df['2item']=df['Cumulative'].rolling(window=2).mean().plot()

print(df)

CodePudding user response：

For future reference of other users, this is the new version of the code that I implemented following Golden Lion suggestion:

def compoundfunct(arr):
    return np.product(1 arr)**(4/len(arr)) - 1

# 1-yr annulized return
df["R_1Yr"]=df.groupby('Entity')['r_qrt'].rolling(4).apply(compoundfunct).groupby('Entity').shift(0).reset_index().set_index('level_1').drop('Entity',axis=1)

# 3-yr annulized return
df["R_3Yr"]=df.groupby('Entity')['r_qrt'].rolling(12).apply(compoundfunct).groupby('Entity').shift(0).reset_index().set_index('level_1').drop('Entity',axis=1)

The performance of the previous code was 36.4 sec for a dataframe of 5,640 rows. The new code is more than 10x faster, it took 2.8 sec

One of the issues with this new code is that one has to make sure that rows are sorted by group (Entity in my case) and date before running the calculations, otherwise results could be wrong.

Thanks, S.