Evaluate Time Dependent Rate of Return to create Pandas DataFrame-CodePudding

Suppose that I have a Pandas dataframe as follows:

 ------------ -------- 
|    Date    | Price  |
 ------------ -------- 
| 2021-07-30 | 438.51 |
| 2021-08-02 | 437.59 |
| 2021-08-03 | 441.15 |
| 2021-08-04 | 438.98 |
 ------------ --------

The above data frame can be genreated using the code below:

data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
        'Price': [438.51, 437.59, 441.15, 438.98]
        }

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25

For the given timeseries I want to compute the rate_of_return which is time dependent and problem here is to identify the time periods between which the best or worst values of rate_of_return is achieved.

One could simply calculate rate_of_return on all possible combinations and then create a data frame containing period_start period_end and rate_of_return and sort in descending (best) or ascending (worst) order and then exclude any periods where there is overlap.

rate_of_return = ((period_end_price/period_start_price)^(compounding_days/(days_in_between))-1 * (normalisation_days/compounding_days)

On above data frame I've computed rate_of_return using code below

df['rate_of_return_l1'] = ((((df.Price /
                                   df.Price[0]) **
                                  (compounding_days /
                                   (df.Date - df.Date[0]).dt.days) - 1) *
                                 (normalisation_days /
                                  compounding_days)))
df['rate_of_return_l1'].iloc[0] = np.nan

df['rate_of_return_l2'] = ((((df.Price /
                                   df.Price[1]) **
                                  (compounding_days /
                                   (df.Date - df.Date[1]).dt.days) - 1) *
                                 (normalisation_days /
                                  compounding_days)))
df['rate_of_return_l2'].iloc[:2] = np.nan

df['rate_of_return_l3'] = ((((df.Price /
                                   df.Price[2]) **
                                  (compounding_days /
                                   (df.Date - df.Date[2]).dt.days) - 1) *
                                 (normalisation_days /
                                  compounding_days)))
df['rate_of_return_l3'].iloc[:3] = np.nan

Based on the results the best/worst cases period are as below

 -------------- ------------ ---------------- 
| Period Start | Period End | Rate of Return |
 -------------- ------------ ---------------- 
| 2021-08-02   | 2021-08-03 |    18.28751739 |
| 2021-08-02   | 2021-08-04 |    0.784586925 |
| 2021-07-30   | 2021-08-03 |    0.729942907 |
| 2021-07-30   | 2021-08-04 |    0.081397181 |
| 2021-07-30   | 2021-08-02 |   -0.225626914 |
| 2021-08-03   | 2021-08-04 |   -0.834880227 |
 -------------- ------------ ----------------

Expected Output

If I want to see the best of rate_of_return the resulting dataframe would be

 -------------- ------------ ---------------- 
| Period Start | Period End | Rate of Return |
 -------------- ------------ ---------------- 
| 2021-08-02   | 2021-08-03 |    18.28751739 |
 -------------- ------------ ----------------

If I want to see the worst of rate_of_return the resulting dataframe would be

 -------------- ------------ ---------------- 
| Period Start | Period End | Rate of Return |
 -------------- ------------ ---------------- 
| 2021-08-03   | 2021-08-04 |   -0.834880227 |
| 2021-07-30   | 2021-08-02 |   -0.225626914 |
 -------------- ------------ ----------------

What we be the optimal approach to test out all scenarios to compute rate_of_return?
How can I achieve the expected output such that periods don't overlap? (seen in expected output)
Best/Worst data frame is not sign dependent best data frame can contain negative rate_of_returns given that there is no time period overlap.
What would the approach be if formula changes to (period_end_price/period_start_price) - 1(not time dependent)?

CodePudding user response：

If I understood correctly, there are two parts to your question -

Part 1: Generating combinations

For generating combinations, you can use itertools, compute the returns for each combination and sort the results.

from itertools import combinations
rors = []
for combination in combinations(zip(df['Date'], df['Price']), 2):
    (start_date, start_price), (end_date, end_price) = combination
    ror = (end_price / start_price) ** (compounding_days / (end_date - start_date).days) - 1
    rors.append((start_date, end_date, ror))

sorted_rors = sorted(rors, key=lambda x: x[2], reverse=True)
print(sorted_rors[0])
#(Timestamp('2021-08-02 00:00:00'),
# Timestamp('2021-08-03 00:00:00'),
# 18.28751738702541)

print(sorted_rors[-1])
#(Timestamp('2021-08-03 00:00:00'),
# Timestamp('2021-08-04 00:00:00'),
# -0.8348802270491325)

Part 2: Non Overlapping Time Periods

This part I am not very clear about, but I am guessing you are trying to find the top n returns with time periods that do not overlap. If the number of time periods you are looking at is large, you can think of using a generator function -

def next_non_overlapping(iterable):
    it = iter(iterable)
    first_start, first_end, first_ror = next(it)
    yield (first_start, first_end, first_ror)
    while True:
        try:
            next_start, next_end, next_ror = next(it)
            if next_start >= first_end or next_end <= first_start:
                yield (next_start, next_end, next_ror)
                first_start, first_end, first_ror = next_start, next_end, next_ror
        except StopIteration:
            print("No more items")
            break

nno = next_non_overlapping(sorted_rors)
print(next(nno))
#(Timestamp('2021-08-02 00:00:00'),
# Timestamp('2021-08-03 00:00:00'),
# 18.28751738702541)
print(next(nno))
#(Timestamp('2021-07-30 00:00:00'),
# Timestamp('2021-08-02 00:00:00'),
# -0.22562691374181088)
print(next(nno))
#(Timestamp('2021-08-03 00:00:00'),
# Timestamp('2021-08-04 00:00:00'),
# -0.8348802270491325)
print(next(nno))
# No more items

For getting the n lowest returns, you can simply pass in the reversed list to the function - i.e.

nnor = next_non_overlapping(reversed(sorted_rors))

CodePudding user response：

Define your function, you can pass directly the data frame and the start, end date:

import numpy as np
import pandas as pd

data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
        'Price': [438.51, 437.59, 441.15, 438.98]
        }

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25

def rate_ret(df, start_date, end_date):

    start = df[df.Date==start_date].iloc[0]
    end = df[df.Date==end_date].iloc[0]
    period_start_price = start.Price
    period_end_price = end.Price
    days_in_between = (end.Date - start.Date).days
    return ((period_end_price/period_start_price)**(compounding_days/days_in_between)-1) * (normalisation_days/compounding_days)

# Iterate over all possible date intervals creating an array (or matrix),
#in the second `for` loop, we only include dates bigger than the starting date:

array = []
for start_date in df.Date:
    for end_date in df.Date[df.Date>start_date]:
        array.append([rate_ret(df, start_date, end_date), start_date, end_date])
print(array)

# To extract the best and the worst periods with no overlapping, 
# take the best save it and iteratively save the next comparing if they collide or not with the previous stored intervals:

def extract_non_overlaping(df):
    saved_rows = [df.iloc[0]]
    for i,row in df.iterrows():
        for saved in saved_rows:
            if (row['Period End'] < saved['Period Start']) or (row['Period Start'] > saved['Period End']):
                saved_rows.append(row)
                break # avoid saving duplicates
    return pd.DataFrame(saved_rows, columns=['Rate of Return','Period Start','Period End'])

df_higher  = pd.DataFrame(array, columns=['Rate of Return','Period Start','Period End']).reset_index(drop=True).sort_values(['Rate of Return'],ascending=False)
df_lower  = pd.DataFrame(array, columns=['Rate of Return','Period Start','Period End']).reset_index(drop=True).sort_values(['Rate of Return'])

extract_non_overlaping(df_higher)
extract_non_overlaping(df_lower)

And the result lower:

 -------------- ------------ ---------------- 
| Period Start | Period End | Rate of Return |
 -------------- ------------ ---------------- 
| 2021-08-02   | 2021-08-03 |    18.28751739 |
 -------------- ------------ ----------------

Higher:

 -------------- ------------ ---------------- 
| Period Start | Period End | Rate of Return |
 -------------- ------------ ---------------- 
| 2021-08-03   | 2021-08-04 |   -0.834880227 |
| 2021-07-30   | 2021-08-02 |   -0.225626914 |
 -------------- ------------ ----------------

If the formula doesn't depend on the time just change the formula in the definition of rete_ret nothing else.

pd: There are some optimizations that you can do, but overall the code works.