Home > Mobile >  How tp speed up looping arrays as inputs for pandas calculation?
How tp speed up looping arrays as inputs for pandas calculation?

Time:04-21

I have two arrays named x and y. The goal is to iterate them as the input for pandas calculation.

Here's an example. Iterating each x and y and appending the calculation result to the res list is slow.

The calculation is to get the exponential of each column modified by a and then sum together, multiply with b. Anyway, this calculation can be replaced by any other calculations.

import numpy as np
import pandas as pd

np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 1)),columns=['data'])

x = np.linspace(1, 24, 4)
y = np.linspace(10, 1500, 5)

res = []

for a in x:
    for b in y:
        res.append(np.exp(-df/a).sum().values[0]*b)

res = np.array(res).reshape(4, 5)

expected output:

array([[  11.67676844,  446.63639283,  881.59601721, 1316.5556416 ,
        1751.51526599],
       [  37.52524129, 1435.34047927, 2833.15571725, 4230.97095523,
        5628.78619321],
       [  42.79406912, 1636.87314392, 3230.95221871, 4825.0312935 ,
        6419.1103683 ],
       [  44.93972433, 1718.94445549, 3392.94918665, 5066.95391781,
        6740.95864897]])

CodePudding user response:

You can use numpy broadcasting:

res = np.array(res).reshape(4, 5)

print (res)
[[  11.67676844  446.63639283  881.59601721 1316.5556416  1751.51526599]
 [  37.52524129 1435.34047927 2833.15571725 4230.97095523 5628.78619321]
 [  42.79406912 1636.87314392 3230.95221871 4825.0312935  6419.1103683 ]
 [  44.93972433 1718.94445549 3392.94918665 5066.95391781 6740.95864897]]

res = np.exp(-df.to_numpy()/x).sum(axis=0)[:, None] * y
  
print (res)
[[  11.67676844  446.63639283  881.59601721 1316.5556416  1751.51526599]
 [  37.52524129 1435.34047927 2833.15571725 4230.97095523 5628.78619321]
 [  42.79406912 1636.87314392 3230.95221871 4825.0312935  6419.1103683 ]
 [  44.93972433 1718.94445549 3392.94918665 5066.95391781 6740.95864897]]

CodePudding user response:

I think what you want is:

z = -df['data'].to_numpy()
res = np.exp(z/x[:, None]).sum(axis=1)[:, None]*y

output:

array([[  11.67676844,  446.63639283,  881.59601721, 1316.5556416 ,
        1751.51526599],
       [  37.52524129, 1435.34047927, 2833.15571725, 4230.97095523,
        5628.78619321],
       [  42.79406912, 1636.87314392, 3230.95221871, 4825.0312935 ,
        6419.1103683 ],
       [  44.93972433, 1718.94445549, 3392.94918665, 5066.95391781,
        6740.95864897]])
  • Related