I am struggling to run my loop calculation across 300000 users. My calculation requires multiple appends of the previous calculation, and I wanted to know if there is a more optimal/faster way to keep appending within a loop? , or a way to make the calculation faster?
please find a working example below:
sample data:
data1 = pd.DataFrame({"cust_id": ['c1', 'c2'],
"state": ['B', 'E'],
"amount": [1000,2000],
"year":[3, 4],
"group":[10, 25,],
"loan_rate":[0.15, 0.12],})
state_list = ['A','B','C','D','E']
data1['state'] = pd.Categorical(data1['state'],
categories=state_list,
ordered=True).codes
data1 is just a sample of customer data.
multi_arr = np.arange(125).reshape(5,5,5)
#3d array consisting of data for years (1-5) and states (A-E)
Additional calculation data & preprocessing:
l1 = pd.DataFrame({'year': [1, 2, 3, 4, 5],
'lim %': [0.1, 0.1, 0.1, 0.1, 0.1]})
l2 = pd.concat([pd.DataFrame({'group':g, 'lookup_val': 0.2, 'year':range(1, 6)}
for g in data1['group'].unique())]).explode('year')
list1 = [l1, l2]
l1=l1.set_index(['year'])
l2=l2.set_index(['year','group'])
loop calculation:
results={}
for customer, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
for year in range(start, len(multi_arr) 1):
if year == start:
results[customer] = [[amount * multi_arr[year-1, state, :]]]
else:
results[customer][-1].append(l1.loc[year].iat[0] * np.array(results[customer][-1][-1]))
results[customer][-1].append(l2.loc[(year,group)].iat[0] * results[customer][-1][-1])
results[customer][-1].append(results[customer][-1][-1] * loan_rate)
results[customer][-1].append(results[customer][-1][-1]- 60)
results[customer][-1].append([results[customer][-1][-1] @ multi_arr[year-1]])
The code above works for my sample data, but it is very slow. I have about 20 more similar append calculations to add inbetween the matrix calculation.
Any help , advice or suggestions would be highly appreciated.Thank You :)
CodePudding user response:
Your code will need to lot of work to get sorted, I estimate 5 phases. I show two here, the revised code generates the same result but is simpler.
results={}
for customer, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
res = [amount * multi_arr[start-1, state, :]]
for year in range(start 1, len(multi_arr) 1):
res.append(l1.loc[year].iat[0] * np.array(res[-1]))
res.append(l2.loc[(year,group)].iat[0] * res[-1])
res.append(res[-1] * loan_rate)
res.append(res[-1]- 60)
res.append([res[-1] @ multi_arr[year-1]])
results[customer] = res
It is not clear why you are constructing such a complex data structure for late processing, but so be it.