Home > Back-end >  Create multiple dynamically named pandas columns and apply a function on it
Create multiple dynamically named pandas columns and apply a function on it

Time:08-02

I have a dataframe with OHLC values for an asset from which I can calculate technical indicators, for example, EMA or SMA for one ndays = 10 as follows:

import yfinance as yf
import pandas as pd

# Get symbol OHLC data
tsla = yf.Ticker("TSLA")
df = tsla.history(period='ytd', interval='1h')
df = df.drop(['Dividends','Stock Splits'], axis=1)

def sma(df, ndays): 
    df['TP'] = (df['High']   df['Low']   df['Close']) / 3 
    df['SMA_{}'.format(ndays)] = df['TP'].rolling(ndays).mean()
    return df

df = sma(df,10)
df

df result for code above

My Problem: This would generate one column named SMA_10. However, I have a range of values for ndays from 2-15 for which I would like to generate a specific named column for each, for example SMA_2, SMA_3, SMA_4,SMA_5....SMA_15. In the actual data I have to generate over 200 such columns.

My attempt:

for i in range(2,16):
    if not "SMA_{i}" in df.columns:
        df = df.apply(lambda row: pd.Series(sma(df,i)), axis=1)
        break

However, this does not work as it returns the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there a way for me to dynamically create such named columns and apply a function on it? Thanks in advance!

CodePudding user response:

I found 2 mistakes: First in your function sma() you have used iinstead of ndays. Second, you have build your sma() function that it takes a dataframe, however, you have called it with a row. (The variable row was not used either) By directly calling sma() you will get the result:

for i in range(2,16):
    if not "SMA_{i}" in df.columns:
        df = sma(df,i)
        # break # Add for debugging purpose 
  • Related