Function to add a column based on the input from a specific column-CodePudding

I have the following dataframe:

import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
from datetime import date, timedelta
yf.pdr_override()

end = date.today()
start = end - timedelta(days=7300)

# download dataframe
data = pdr.get_data_yahoo('^GSPC', start=start, end= end)

Now, that I have the dataframe, I want to create a function to add the logarithmic return based on a column to the dataframe called 'data', with the following code:

data['log_return'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))

How I think the function should look like is like this:

def add_log_return(df):
    
    # add returns in a logarithmic fashion
    added = df.copy()
    added["log_return"] = np.log(df[column] / df[column].shift(1))
    added["log_return"] = added["log_return"].apply(lambda x: x*100)
    return added

How can I select a specific column as an input of the function add_log_return(df['Adj Close']), so the function adds the logarithmic return to my 'data' dataframe?

data = add_log_return(df['Adj Close'])

CodePudding user response：

Just add an argument column to your function!

def add_log_return(df, column): 
    # add returns in a logarithmic fashion
    added = df.copy()
    added["log_return"] = np.log(df[column] / df[column].shift(1)) * 100
    return added

new_df = add_log_return(old_df, 'Adj_Close')

^{Note I removed the line in your function to apply a lambda that just multiplied by 100. It's much faster to do this in a vectorized manner, by including it in the np.log(...) line}

However, if I were you, I'd just return the Series object instead of copying the dataframe and modifying and returning the copy.

def log_return(col: pd.Series) -> np.ndarray: 
    return np.log(col / col.shift(1)) * 100

Now, the caller can do what they want with it:

df['log_ret'] = log_return(df['Adj_Close'])