Suppose that I have a Pandas dataframe as follows:
------------ --------
| Date | Price |
------------ --------
| 2021-07-30 | 438.51 |
| 2021-08-02 | 437.59 |
| 2021-08-03 | 441.15 |
| 2021-08-04 | 438.98 |
------------ --------
The above data frame can be genreated using the code below:
data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
'Price': [438.51, 437.59, 441.15, 438.98]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25
For the given timeseries I want to compute the rate_of_return
which is time dependent and problem here is to identify the time periods between which the best or worst values of rate_of_return
is achieved.
One could simply calculate rate_of_return
on all possible combinations and then create a data frame containing period_start
period_end
and rate_of_return
and sort in descending (best) or ascending (worst) order and then exclude any periods where there is overlap.
rate_of_return = ((period_end_price/period_start_price)^(compounding_days/(days_in_between))-1 * (normalisation_days/compounding_days)
On above data frame I've computed rate_of_return
using code below
df['rate_of_return_l1'] = ((((df.Price /
df.Price[0]) **
(compounding_days /
(df.Date - df.Date[0]).dt.days) - 1) *
(normalisation_days /
compounding_days)))
df['rate_of_return_l1'].iloc[0] = np.nan
df['rate_of_return_l2'] = ((((df.Price /
df.Price[1]) **
(compounding_days /
(df.Date - df.Date[1]).dt.days) - 1) *
(normalisation_days /
compounding_days)))
df['rate_of_return_l2'].iloc[:2] = np.nan
df['rate_of_return_l3'] = ((((df.Price /
df.Price[2]) **
(compounding_days /
(df.Date - df.Date[2]).dt.days) - 1) *
(normalisation_days /
compounding_days)))
df['rate_of_return_l3'].iloc[:3] = np.nan
Based on the results the best/worst cases period are as below
-------------- ------------ ----------------
| Period Start | Period End | Rate of Return |
-------------- ------------ ----------------
| 2021-08-02 | 2021-08-03 | 18.28751739 |
| 2021-08-02 | 2021-08-04 | 0.784586925 |
| 2021-07-30 | 2021-08-03 | 0.729942907 |
| 2021-07-30 | 2021-08-04 | 0.081397181 |
| 2021-07-30 | 2021-08-02 | -0.225626914 |
| 2021-08-03 | 2021-08-04 | -0.834880227 |
-------------- ------------ ----------------
Expected Output
If I want to see the best of rate_of_return
the resulting dataframe would be
-------------- ------------ ----------------
| Period Start | Period End | Rate of Return |
-------------- ------------ ----------------
| 2021-08-02 | 2021-08-03 | 18.28751739 |
-------------- ------------ ----------------
If I want to see the worst of rate_of_return
the resulting dataframe would be
-------------- ------------ ----------------
| Period Start | Period End | Rate of Return |
-------------- ------------ ----------------
| 2021-08-03 | 2021-08-04 | -0.834880227 |
| 2021-07-30 | 2021-08-02 | -0.225626914 |
-------------- ------------ ----------------
- What we be the optimal approach to test out all scenarios to compute
rate_of_return
? - How can I achieve the expected output such that periods don't overlap? (seen in expected output)
- Best/Worst data frame is not sign dependent best data frame can contain negative
rate_of_returns
given that there is no time period overlap. - What would the approach be if formula changes to
(period_end_price/period_start_price) - 1
(not time dependent)?
CodePudding user response:
If I understood correctly, there are two parts to your question -
Part 1: Generating combinations
For generating combinations, you can use itertools
, compute the returns for each combination and sort the results.
from itertools import combinations
rors = []
for combination in combinations(zip(df['Date'], df['Price']), 2):
(start_date, start_price), (end_date, end_price) = combination
ror = (end_price / start_price) ** (compounding_days / (end_date - start_date).days) - 1
rors.append((start_date, end_date, ror))
sorted_rors = sorted(rors, key=lambda x: x[2], reverse=True)
print(sorted_rors[0])
#(Timestamp('2021-08-02 00:00:00'),
# Timestamp('2021-08-03 00:00:00'),
# 18.28751738702541)
print(sorted_rors[-1])
#(Timestamp('2021-08-03 00:00:00'),
# Timestamp('2021-08-04 00:00:00'),
# -0.8348802270491325)
Part 2: Non Overlapping Time Periods
This part I am not very clear about, but I am guessing you are trying to find the top n returns with time periods that do not overlap. If the number of time periods you are looking at is large, you can think of using a generator function -
def next_non_overlapping(iterable):
it = iter(iterable)
first_start, first_end, first_ror = next(it)
yield (first_start, first_end, first_ror)
while True:
try:
next_start, next_end, next_ror = next(it)
if next_start >= first_end or next_end <= first_start:
yield (next_start, next_end, next_ror)
first_start, first_end, first_ror = next_start, next_end, next_ror
except StopIteration:
print("No more items")
break
nno = next_non_overlapping(sorted_rors)
print(next(nno))
#(Timestamp('2021-08-02 00:00:00'),
# Timestamp('2021-08-03 00:00:00'),
# 18.28751738702541)
print(next(nno))
#(Timestamp('2021-07-30 00:00:00'),
# Timestamp('2021-08-02 00:00:00'),
# -0.22562691374181088)
print(next(nno))
#(Timestamp('2021-08-03 00:00:00'),
# Timestamp('2021-08-04 00:00:00'),
# -0.8348802270491325)
print(next(nno))
# No more items
For getting the n lowest returns, you can simply pass in the reversed list to the function - i.e.
nnor = next_non_overlapping(reversed(sorted_rors))
CodePudding user response:
Define your function, you can pass directly the data frame and the start, end date:
import numpy as np
import pandas as pd
data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
'Price': [438.51, 437.59, 441.15, 438.98]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25
def rate_ret(df, start_date, end_date):
start = df[df.Date==start_date].iloc[0]
end = df[df.Date==end_date].iloc[0]
period_start_price = start.Price
period_end_price = end.Price
days_in_between = (end.Date - start.Date).days
return ((period_end_price/period_start_price)**(compounding_days/days_in_between)-1) * (normalisation_days/compounding_days)
# Iterate over all possible date intervals creating an array (or matrix),
#in the second `for` loop, we only include dates bigger than the starting date:
array = []
for start_date in df.Date:
for end_date in df.Date[df.Date>start_date]:
array.append([rate_ret(df, start_date, end_date), start_date, end_date])
print(array)
# To extract the best and the worst periods with no overlapping,
# take the best save it and iteratively save the next comparing if they collide or not with the previous stored intervals:
def extract_non_overlaping(df):
saved_rows = [df.iloc[0]]
for i,row in df.iterrows():
for saved in saved_rows:
if (row['Period End'] < saved['Period Start']) or (row['Period Start'] > saved['Period End']):
saved_rows.append(row)
break # avoid saving duplicates
return pd.DataFrame(saved_rows, columns=['Rate of Return','Period Start','Period End'])
df_higher = pd.DataFrame(array, columns=['Rate of Return','Period Start','Period End']).reset_index(drop=True).sort_values(['Rate of Return'],ascending=False)
df_lower = pd.DataFrame(array, columns=['Rate of Return','Period Start','Period End']).reset_index(drop=True).sort_values(['Rate of Return'])
extract_non_overlaping(df_higher)
extract_non_overlaping(df_lower)
And the result lower:
-------------- ------------ ----------------
| Period Start | Period End | Rate of Return |
-------------- ------------ ----------------
| 2021-08-02 | 2021-08-03 | 18.28751739 |
-------------- ------------ ----------------
Higher:
-------------- ------------ ----------------
| Period Start | Period End | Rate of Return |
-------------- ------------ ----------------
| 2021-08-03 | 2021-08-04 | -0.834880227 |
| 2021-07-30 | 2021-08-02 | -0.225626914 |
-------------- ------------ ----------------
If the formula doesn't depend on the time just change the formula in the definition of rete_ret nothing else.
pd: There are some optimizations that you can do, but overall the code works.