Home > Net >  Multiplicate Dataframe row with a matrix
Multiplicate Dataframe row with a matrix

Time:07-07

I am trying to multiplicate a dataframe with with a matrix consisting of items from the dataframe.

I am able to solve the problem with a for-loop, but with a large dataframe it takes very long.

df = pd.DataFrame({"A": [1, 2, 3, 4],
               "B": [5, 6, 7, 8],
               "C": [9, 10, 11, 12],
               "D": [1, 1, 1, 1]})
l = []
for index, row in df.iterrows():
           l.append(df.loc[index].dot(np.array([[np.sin(df["A"].loc[index]), 0, 0, np.sin(df["A"].loc[index])],
                                     [0, np.sign(df["B"].loc[index]), 0, np.abs(df["C"].loc[index])],
                                     [np.sign(df["C"].loc[index]), 0, np.sign(df["C"].loc[index]), 0],
                                     [1, 2, 0, np.tan(df["C"].loc[index])]])))
df[["U", "V", "W", "X"]] = l
print(df)

Thanks for your help.

CodePudding user response:

It may be easier to work with arrays, rather than a dataframe. Indexing will be lot simpler

The frame's numpy values:

In [46]: df.values
Out[46]: 
array([[ 1,  5,  9,  1],
       [ 2,  6, 10,  1],
       [ 3,  7, 11,  1],
       [ 4,  8, 12,  1]], dtype=int64)

And for one "row", the 2d array is:

In [47]: index = 0    
In [48]: np.array([[np.sin(df["A"].loc[index]), 0, 0, np.sin(df["A"].loc[index])],
    ...:                                      [0, np.sign(df["B"].loc[index]), 0, np.abs(df["C"].loc[index])],
    ...:                                      [np.sign(df["C"].loc[index]), 0, np.sign(df["C"].loc[index]), 0],
    ...:                                      [1, 2, 0, np.tan(df["C"].loc[index])]])                                  
Out[48]: 
array([[ 0.84147098,  0.        ,  0.        ,  0.84147098],
       [ 0.        ,  1.        ,  0.        ,  9.        ],
       [ 1.        ,  0.        ,  1.        ,  0.        ],
       [ 1.        ,  2.        ,  0.        , -0.45231566]])
In [52]: Out[46][0].dot(Out[48])
Out[52]: array([10.84147098,  7.        ,  9.        , 45.38915533])

compare that with your apply

In [51]: l
Out[51]: 
[array([10.84147098,  7.        ,  9.        , 45.38915533]),
 array([12.81859485,  8.        , 10.        , 62.46695568]),
 array([  12.42336002,    9.        ,   11.        , -148.52748643]),
 array([ 9.97279002, 10.        , 12.        , 92.33693009])]

In array terms, the 2d array is:

In [53]: x = df.values
In [56]: index=0
In [57]: np.array([[np.sin(x[index,0]), 0, 0, np.sin(x[index,0])],
    ...:                                      [0, np.sign(x[index,1]), 0, np.abs(x[index,2])],
    ...:                                      [np.sign(x[index,2]), 0, np.sign(x[index,2]), 0],
    ...:                                      [1, 2, 0, np.tan(x[index,2])]])
Out[57]: 
array([[ 0.84147098,  0.        ,  0.        ,  0.84147098],
       [ 0.        ,  1.        ,  0.        ,  9.        ],
       [ 1.        ,  0.        ,  1.        ,  0.        ],
       [ 1.        ,  2.        ,  0.        , -0.45231566]])

To do this faster we need to construct such an array for all rows of x at once.

In einsum matrix multiplication terms, the row operation is:

 np.einsum('j,jk->k',x,A)

generalized, we need a 3d array such that

 np.einsum('ij,ijk->ik',x,A)

We could iterate on index to produce the 3d A. We can't simply replace the scalar index with a slice or arange.

By defining a couple of variables, we can construct the 3d A with:

In [64]: Z = np.zeros(4); index=np.arange(4)
In [65]: A=np.array([[np.sin(x[index,0]), Z, Z, np.sin(x[index,0])],
    ...:                                      [Z, np.sign(x[index,1]), Z, np.abs(x[index,2])],
    ...:                                      [np.sign(x[index,2]), Z, np.sign(x[index,2]), Z],
    ...:                                      [Z 1, Z 2, Z, np.tan(x[index,2])]])

In [66]: A.shape
Out[66]: (4, 4, 4)

This has placed the index dimension last.

In [67]: A[:,:,0]
Out[67]: 
array([[ 0.84147098,  0.        ,  0.        ,  0.84147098],
       [ 0.        ,  1.        ,  0.        ,  9.        ],
       [ 1.        ,  0.        ,  1.        ,  0.        ],
       [ 1.        ,  2.        ,  0.        , -0.45231566]])

So the einsum needs to be:

In [68]: res=np.einsum('ij,jki->ik',x,A)

In [69]: res
Out[69]: 
array([[  10.84147098,    7.        ,    9.        ,   45.38915533],
       [  12.81859485,    8.        ,   10.        ,   62.46695568],
       [  12.42336002,    9.        ,   11.        , -148.52748643],
       [   9.97279002,   10.        ,   12.        , 92.33693009]])

This matches your l values.

The 3d A could be constructed other ways, but I chose this as requiring a minimum of editing.

  • Related