I have the following dataframe:
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
from datetime import date, timedelta
yf.pdr_override()
end = date.today()
start = end - timedelta(days=7300)
# download dataframe
data = pdr.get_data_yahoo('^GSPC', start=start, end= end)
Now, that I have the dataframe, I want to create a function to add the logarithmic return based on a column to the dataframe called 'data', with the following code:
data['log_return'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))
How I think the function should look like is like this:
def add_log_return(df):
# add returns in a logarithmic fashion
added = df.copy()
added["log_return"] = np.log(df[column] / df[column].shift(1))
added["log_return"] = added["log_return"].apply(lambda x: x*100)
return added
How can I select a specific column as an input of the function add_log_return(df['Adj Close']), so the function adds the logarithmic return to my 'data' dataframe?
data = add_log_return(df['Adj Close'])
CodePudding user response:
Just add an argument column
to your function!
def add_log_return(df, column):
# add returns in a logarithmic fashion
added = df.copy()
added["log_return"] = np.log(df[column] / df[column].shift(1)) * 100
return added
new_df = add_log_return(old_df, 'Adj_Close')
Note I removed the line in your function to apply a lambda that just multiplied by 100. It's much faster to do this in a vectorized manner, by including it in the np.log(...)
line
However, if I were you, I'd just return the Series
object instead of copying the dataframe and modifying and returning the copy.
def log_return(col: pd.Series) -> np.ndarray:
return np.log(col / col.shift(1)) * 100
Now, the caller can do what they want with it:
df['log_ret'] = log_return(df['Adj_Close'])