Home > Net >  Vectorization with python
Vectorization with python

Time:12-29

My current code is extremely slow with the nested for loop setup. I would like to speed up the process, my assumption would be that the solution is the vectorization with Pandas or NumPy. I do not know how to transfer my current code into the new format.

I have created an example code below.

import pandas as pd
import numpy as np

balance = 10000

raw_data = [[1,2,4,1,3],[2,3,7,2,4],[3,4,5,3,4],[4,4,9,1,5],[5,5,6,4,5]]
raw_df = pd.DataFrame(raw_data, columns=['D','O','H','L','C'])

history_data = [[1,1,5,np.nan,4],[0,1,3,np.nan,4],[1,0,4,2,3],[1,0,1,6,0],[0,1,7,np.nan,8]]
history_df = pd.DataFrame(history_data, columns=['TY','ST','OP','CL','SL'])

for n in raw_df.index:
    for p in history_df.index:
        if history_df['ST'][p] == 1 and history_df['TY'][p] == 1 and history_df['SL'][p] >= raw_df['L'][n]:
           history_df['CL'][p] = raw_df['L'][n]
           history_df['ST'][p] = 0
           balance = balance   20
    if raw_df['C'][n] > 4:
        history_df = history_df.append({'TY':0,'ST':1,'OP':5,'CL':np.nan,'SL':9,},ignore_index = True)

CodePudding user response:

Check out this example, see if it helps :

import numpy as np

# Use NumPy's where function to perform the check for each row of history_df and raw_df simultaneously
mask = np.where((history_df['ST'] == 1) & (history_df['TY'] == 1) & (history_df['SL'] >= raw_df['L']))
history_df.loc[mask, 'CL'] = raw_df.loc[mask, 'L']
history_df.loc[mask, 'ST'] = 0

# Calculate the balance change
balance_change = 20 * len(mask[0])
balance  = balance_change

# Append rows to history_df where raw_df['C'] > 4
new_rows = raw_df[raw_df['C'] > 4]
new_rows['TY'] = 0
new_rows['ST'] = 1
new_rows['OP'] = 5
new_rows['CL'] = np.nan
new_rows['SL'] = 9
history_df = history_df.append(new_rows, ignore_index=True)

  • Related