Home > Net >  How to optimize loop calculation with multiple appends python?
How to optimize loop calculation with multiple appends python?

Time:07-28

I am struggling to run my loop calculation across 300000 users. My calculation requires multiple appends of the previous calculation, and I wanted to know if there is a more optimal/faster way to keep appending within a loop? , or a way to make the calculation faster?

please find a working example below:

sample data:

data1 = pd.DataFrame({"cust_id": ['c1', 'c2'],
                              "state": ['B', 'E'],
                              "amount": [1000,2000],
                              "year":[3, 4],
                              "group":[10, 25,],
                     "loan_rate":[0.15, 0.12],})

state_list = ['A','B','C','D','E']

data1['state'] = pd.Categorical(data1['state'], 
                                        categories=state_list, 
                                        ordered=True).codes

data1 is just a sample of customer data.

multi_arr = np.arange(125).reshape(5,5,5) #3d array consisting of data for years (1-5) and states (A-E)

Additional calculation data & preprocessing:

l1 = pd.DataFrame({'year': [1, 2, 3, 4, 5],
                       'lim %': [0.1, 0.1, 0.1, 0.1, 0.1]})
l2 = pd.concat([pd.DataFrame({'group':g, 'lookup_val': 0.2, 'year':range(1, 6)} 
                                  for g in data1['group'].unique())]).explode('year')

list1 = [l1, l2]
l1=l1.set_index(['year'])
l2=l2.set_index(['year','group'])

loop calculation:

results={}
for customer, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
    for year in range(start, len(multi_arr) 1):
        if year == start:
                results[customer] = [[amount * multi_arr[year-1, state, :]]]
        else:
                results[customer][-1].append(l1.loc[year].iat[0] * np.array(results[customer][-1][-1]))
                results[customer][-1].append(l2.loc[(year,group)].iat[0] * results[customer][-1][-1])
                results[customer][-1].append(results[customer][-1][-1] * loan_rate)
                results[customer][-1].append(results[customer][-1][-1]- 60)
                results[customer][-1].append([results[customer][-1][-1] @ multi_arr[year-1]])

The code above works for my sample data, but it is very slow. I have about 20 more similar append calculations to add inbetween the matrix calculation.

Any help , advice or suggestions would be highly appreciated.Thank You :)

CodePudding user response:

Your code will need to lot of work to get sorted, I estimate 5 phases. I show two here, the revised code generates the same result but is simpler.

results={}
for customer, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
    res = [amount * multi_arr[start-1, state, :]]
    for year in range(start 1, len(multi_arr) 1):
        res.append(l1.loc[year].iat[0] * np.array(res[-1]))
        res.append(l2.loc[(year,group)].iat[0] * res[-1])
        res.append(res[-1] * loan_rate)
        res.append(res[-1]- 60)
        res.append([res[-1] @ multi_arr[year-1]])
    results[customer] = res

It is not clear why you are constructing such a complex data structure for late processing, but so be it.

  • Related