I developed a code to analyze a large set of stock prices. Basically it uses two technical indicators (MACD and EMA) and creates a Technical analysis flag.
The code is working, which is great but it is taking too much time to execute, most likely because of the iterations using iloc. Do you have some suggestions to improve speed? I provide an example below:
import pandas as pd
import numpy as np
import time
df = pd.DataFrame(np.random.uniform(low=2, high=5.5, size=(10000,)), columns=['Close'])
close = df['Close'].astype(float)
def MACD(first,second,signal):
df['EMA' str(first)] = close.ewm(span=first).mean()
df['EMA' str(second)] = close.ewm(span=second).mean()
df['MACD']=df['EMA' str(first)]-df['EMA' str(second)]
df['signal']=df.MACD.ewm(span=signal).mean()
df['MACD_ind'] = 0
for i in range (second signal, len(df)):
if df.MACD.iloc[i]>df.signal.iloc[i] and df.MACD.iloc[i-1]<df.signal.iloc[i-1]:
df.loc[i,'MACD_ind']=1
if df.MACD.iloc[i]<df.signal.iloc[i] and df.MACD.iloc[i-1]>df.signal.iloc[i-1]:
df.loc[i,'MACD_ind']=-1
def EMA(first,second):
df['EMA' str(first)] = close.rolling(window=first).mean()
df['EMA' str(second)] = close.rolling(window=second).mean()
df['EMAdif'] = df['EMA' str(first)]-df['EMA' str(second)]
df['EMA_ind'] = 0
for i in range (second, len(df)):
if df.EMAdif.iloc[i]>0 and df.EMAdif.iloc[i-1]<0:
df.loc[i,'EMA_ind']=1
if df.EMAdif.iloc[i]<0 and df.EMAdif.iloc[i-1]>0:
df.loc[i,'EMA_ind']=-1
split_time = time.time()
TA_ind=list()
MACD(12, 26, 9)
TA_ind.append('MACD_ind')
print("MACD--- %s seconds ---" % (time.time() - split_time))
split_time = time.time()
EMA(20,50)
TA_ind.append('EMA_ind')
print("EMA--- %s seconds ---" % (time.time() - split_time))
split_time = time.time()
CodePudding user response:
I found this dataframe function called shift that helped me out quite a bit.
def MACD(first,second,signal):
df['EMA' str(first)] = close.ewm(span=first).mean()
df['EMA' str(second)] = close.ewm(span=second).mean()
df['MACD']=df['EMA' str(first)]-df['EMA' str(second)]
df['signal']=df.MACD.ewm(span=signal).mean()
df['dif']=df['MACD']-df['signal']
df['dif_shift']=df.dif.shift(1)
df['MACD_ind'] = 0
df['MACD_ind']=np.where((df['dif']>0) & (df['dif_shift']<0),1,df['MACD_ind'])
df['MACD_ind']=np.where((df['dif']<0) & (df['dif_shift']>0),-1,df['MACD_ind'])
def EMA(first,second):
df['EMA' str(first)] = close.rolling(window=first).mean()
df['EMA' str(second)] = close.rolling(window=second).mean()
df['EMAdif'] = df['EMA' str(first)]-df['EMA' str(second)]
df['EMAdif_shift'] = df.EMAdif.shift(1)
df['EMA_ind'] = 0
df['EMA_ind']=np.where((df['EMAdif']>0) & (df['EMAdif_shift']<0),1,df['EMA_ind'])
df['EMA_ind']=np.where((df['EMAdif']<0) & (df['EMAdif_shift']>0),-1,df['EMA_ind'])
CodePudding user response:
Loop through dictionaries is much faster than loop through DataFrame
. So you can convert your DataFrame
to dictionary. But you can't use pandas built in functions like MACD
or ewm
. So you should write this functions by yourself. Then as a result you have much faster program. As a example, I do this with your dataframe. First i loop through DataFrame
itself and then i convert it to dictionary and loop through it.
df = pd.DataFrame(np.random.uniform(low=2, high=5.5, size=(10000,2)), columns=['Close', 'Open'])
st = time.time()
for i in range(len(df)):
a = df.iloc[i]['Close'] - df.iloc[i]['Open']
print(time.time() - st)
Execution time for this one was 2.005403757095337
.
df = pd.DataFrame(np.random.uniform(low=2, high=5.5, size=(10000,2)), columns=['Close', 'Open'])
df_dict = df.to_dict()
st = time.time()
for i in range(len(df)):
a = df_dict['Close'][i] - df_dict['Open'][i]
print(time.time() - st)
But execution time for this one was 0.0029413700103759766
. It means second method is approximately a thousand time faster!