Python, improving speed of multi values stresstest-CodePudding

I would like to improve the speed of the following code. The data set is a list of trades that I would like to stresstest, by simulating various parameters and have all the results stored in a table.

The way I'm performing this, it's by designing the range of parameters and then I iterate over their values, I initiate a copy of the dataset, I assign the value of the parameters to new columns and I concat everything in a huge dataframe.

I would like to know if someone has a good idea to avoid the three for loops to build the dataframe ?

'''

# Defining the range of parameters to simulate
volchange = range(-1,2)
spreadchange = range(-10,11)
flatchange = range(-10,11)

# the df where I store all the results
final_result = pd.DataFrame()    



# Iterating over the range of parameters 
for vol in volchange:
    for spread in spreadchange:
        for flat in flatchange:
            
            # Creating a copy of the initial dataset, assigning the simulated values to three 
            # new columns and concat it with the rest, resulting in a dataframe which is 
            # several time the initial dataset with all the possible triplet of parameters

            inter_pos = pos.copy()
            
            inter_pos['vol_change[pts]'] = vol
            inter_pos['spread_change[%]'] = spread                  
            inter_pos['spot_change[%]'] = flat

            
            final_result = pd.concat([final_result,inter_pos], axis = 0)



# Performing computation at dataframe level            
final_result['sim_vol'] = final_result['vol_change[pts]']   final_result['ImpliedVolatility']
final_result['spread'].multiply(final_result['spread_change[%]'])/100 
final_result['sim_spread'] = final_result['spread']   final_result['spread_change']
final_result['spot_change'] = final_result['spot'] * final_result['spot_change[%]']/100
final_result['sim_spot'] = final_result['spot']   final_result['spot_change']
final_result['sim_price'] = final_result['sim_spot'] - final_result['sim_spread']

'''

Thanks a lot for your help !

Have a nice week ahead !

CodePudding user response：

Concatenating pandas dataframes onto one another takes a long time. It's better to create a list of dataframes and then use pd.concat to concatenate them all at once.
You can test this yourself like this:

import pandas as pd
import numpy as np
from time import time

dfs = []
columns = [f"{i:02d}" for i in range(100)]

time_start = time()
for i in range(100):
    data = np.random.random((10000, 100))
    df = pd.DataFrame(columns=columns, data=data)
    dfs.append(df)

new_df = pd.concat(dfs)
time_end = time()
print(f"Time elapsed: {time_end-time_start}")
# Time elapsed: 1.851675271987915

new_df = pd.DataFrame(columns=columns)
time_start = time()
for i in range(100):
    data = np.random.random((10000, 100))
    df = pd.DataFrame(columns=columns, data=data)
    new_df = pd.concat([new_df, df])
time_end = time()
print(f"Time elapsed: {time_end-time_start}")
# Time elapsed: 12.258363008499146

You can also use itertools.product to get rid of your nested for loops.

Also as suggested by @Ahmed AEK:

you can pass data=itertools.product(volchange, spreadchange ,flatchange ) to pd.DataFrame directly, and avoid creating the list altogether, which is a more memory efficient and faster approach