I am trying to multiplicate a dataframe with with a matrix consisting of items from the dataframe.
I am able to solve the problem with a for-loop, but with a large dataframe it takes very long.
df = pd.DataFrame({"A": [1, 2, 3, 4],
"B": [5, 6, 7, 8],
"C": [9, 10, 11, 12],
"D": [1, 1, 1, 1]})
l = []
for index, row in df.iterrows():
l.append(df.loc[index].dot(np.array([[np.sin(df["A"].loc[index]), 0, 0, np.sin(df["A"].loc[index])],
[0, np.sign(df["B"].loc[index]), 0, np.abs(df["C"].loc[index])],
[np.sign(df["C"].loc[index]), 0, np.sign(df["C"].loc[index]), 0],
[1, 2, 0, np.tan(df["C"].loc[index])]])))
df[["U", "V", "W", "X"]] = l
print(df)
Thanks for your help.
CodePudding user response:
It may be easier to work with arrays, rather than a dataframe. Indexing will be lot simpler
The frame's numpy values:
In [46]: df.values
Out[46]:
array([[ 1, 5, 9, 1],
[ 2, 6, 10, 1],
[ 3, 7, 11, 1],
[ 4, 8, 12, 1]], dtype=int64)
And for one "row", the 2d array is:
In [47]: index = 0
In [48]: np.array([[np.sin(df["A"].loc[index]), 0, 0, np.sin(df["A"].loc[index])],
...: [0, np.sign(df["B"].loc[index]), 0, np.abs(df["C"].loc[index])],
...: [np.sign(df["C"].loc[index]), 0, np.sign(df["C"].loc[index]), 0],
...: [1, 2, 0, np.tan(df["C"].loc[index])]])
Out[48]:
array([[ 0.84147098, 0. , 0. , 0.84147098],
[ 0. , 1. , 0. , 9. ],
[ 1. , 0. , 1. , 0. ],
[ 1. , 2. , 0. , -0.45231566]])
In [52]: Out[46][0].dot(Out[48])
Out[52]: array([10.84147098, 7. , 9. , 45.38915533])
compare that with your apply
In [51]: l
Out[51]:
[array([10.84147098, 7. , 9. , 45.38915533]),
array([12.81859485, 8. , 10. , 62.46695568]),
array([ 12.42336002, 9. , 11. , -148.52748643]),
array([ 9.97279002, 10. , 12. , 92.33693009])]
In array terms, the 2d array is:
In [53]: x = df.values
In [56]: index=0
In [57]: np.array([[np.sin(x[index,0]), 0, 0, np.sin(x[index,0])],
...: [0, np.sign(x[index,1]), 0, np.abs(x[index,2])],
...: [np.sign(x[index,2]), 0, np.sign(x[index,2]), 0],
...: [1, 2, 0, np.tan(x[index,2])]])
Out[57]:
array([[ 0.84147098, 0. , 0. , 0.84147098],
[ 0. , 1. , 0. , 9. ],
[ 1. , 0. , 1. , 0. ],
[ 1. , 2. , 0. , -0.45231566]])
To do this faster we need to construct such an array for all rows of x
at once.
In einsum
matrix multiplication terms, the row operation is:
np.einsum('j,jk->k',x,A)
generalized, we need a 3d array such that
np.einsum('ij,ijk->ik',x,A)
We could iterate on index
to produce the 3d A
. We can't simply replace the scalar index
with a slice or arange
.
By defining a couple of variables, we can construct the 3d A
with:
In [64]: Z = np.zeros(4); index=np.arange(4)
In [65]: A=np.array([[np.sin(x[index,0]), Z, Z, np.sin(x[index,0])],
...: [Z, np.sign(x[index,1]), Z, np.abs(x[index,2])],
...: [np.sign(x[index,2]), Z, np.sign(x[index,2]), Z],
...: [Z 1, Z 2, Z, np.tan(x[index,2])]])
In [66]: A.shape
Out[66]: (4, 4, 4)
This has placed the index
dimension last.
In [67]: A[:,:,0]
Out[67]:
array([[ 0.84147098, 0. , 0. , 0.84147098],
[ 0. , 1. , 0. , 9. ],
[ 1. , 0. , 1. , 0. ],
[ 1. , 2. , 0. , -0.45231566]])
So the einsum
needs to be:
In [68]: res=np.einsum('ij,jki->ik',x,A)
In [69]: res
Out[69]:
array([[ 10.84147098, 7. , 9. , 45.38915533],
[ 12.81859485, 8. , 10. , 62.46695568],
[ 12.42336002, 9. , 11. , -148.52748643],
[ 9.97279002, 10. , 12. , 92.33693009]])
This matches your l
values.
The 3d A
could be constructed other ways, but I chose this as requiring a minimum of editing.