I have 3 different numpy arrays. For example:
arr1 = array([[1, 1, 1, 1, 1, 1, 0, 1, 1, 1],
[1, 0, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 0, 0, 0, 1, 0, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 0, 1, 1, 0, 1],
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1],
[1, 1, 1, 0, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 0, 1, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])
arr2 = array([[1. , 0. , 0. , 0.77519575, 0. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 1. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[0.77519575, 0. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
arr3 = array([[0.08333333, 0.06666667, 0.13333333, 0.21428571, 0.08571429,
0.17241379, 0. , 0.14285714, 0.06896552, 0.04166667],
[0.16666667, 0. , 0.33333333, 0.21428571, 0.08571429,
0. , 0.14285714, 0.14285714, 0.17241379, 0.08333333],
[0.125 , 0. , 0. , 0. , 0.02857143,
0. , 0.03571429, 0.10714286, 0.13793103, 0.125 ],
[0. , 0.06666667, 0.13333333, 0.14285714, 0.14285714,
0. , 0.03571429, 0.03571429, 0. , 0.04166667],
[0.16666667, 0.13333333, 0.26666667, 0.35714286, 0. ,
0.13793103, 0.07142857, 0.14285714, 0.13793103, 0.16666667],
[0.20833333, 0.06666667, 0.2 , 0. , 0.02857143,
0.10344828, 0.17857143, 0.14285714, 0.03448276, 0.20833333],
[0.20833333, 0.1 , 0.26666667, 0.07142857, 0.08571429,
0.17241379, 0.07142857, 0.14285714, 0. , 0.04166667],
[0.125 , 0.1 , 0.26666667, 0.21428571, 0.08571429,
0. , 0.17857143, 0. , 0.13793103, 0.125 ],
[0.125 , 0.16666667, 0.2 , 0.07142857, 0. ,
0.17241379, 0.17857143, 0.07142857, 0.06896552, 0.125 ],
[0.08333333, 0.16666667, 0.26666667, 0.28571429, 0.02857143,
0.17241379, 0.10714286, 0. , 0. , 0. ]])
What I need to do is loop through each row and get each row from arr1
and arr3
and multiply them with arr2
and then sum the elements and store that result in a dictionary. To make things more clear, this is what I am doing:
results_dict = {}
for i,idx in tqdm(enumerate(my_index)):
my_list = arr1[i]
my_weights = arr3[i]
results_dict[idx] = dict(enumerate(np.sum(my_list * arr2 * my_weights, axis = 0).flatten(), 1))
This works for me, however the sizes of these arrays can get quite large, so I am trying to see if there is a more efficient way to do this, and perhaps use built-in numpy functions to eliminate the loop. Is there a way to do this?
CodePudding user response:
You can use np.einsum to do this.
CodePudding user response:
You could have made things clearer by showing some, if not all of the results. Anyways getting rid of the undefined tqdm
bit, here's what I get:
In [133]: results_dict
Out[133]:
{0: {1: 0.1479329732493475,
2: 0.0,
3: 0.13333333,
4: 0.3803990816777325,
5: 0.08571429,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.06896552,
10: 0.0},
1: {1: 0.2958659642506525,
...
So it's a dict of dicts, with individual numeric values. You cannot get rid of the loops with numpy
this way.
for one i
value the calculation is:
In [135]: i = 0
...: np.sum(arr1[i] * arr2 * arr3[i], axis=0)
Out[135]:
array([0.14793297, 0. , 0.13333333, 0.38039908, 0.08571429,
0. , 0. , 0. , 0.06896552, 0. ])
And yes, as suggested with the other answer we can calculate these values for all "rows" with one expression, einsum
or dot
or matmul
. But there's still the question creating the dicts. And I suspect that's the big time consumer here.
Changing the dicts to list:
In [136]: alist = []
...: for i in range(arr1.shape[0]):
...: my_list = arr1[i]
...: my_weights = arr3[i]
...: alist.append(np.sum(my_list * arr2 * my_weights, axis=0))
...:
In [137]: alist
Out[137]:
[array([0.14793297, 0. , 0.13333333, 0.38039908, 0.08571429,
0. , 0. , 0. , 0.06896552, 0. ]),
...
0. , 0.10714286, 0. , 0. , 0. ])]
which can be turned into a (10,10) array with:
In [138]: np.array(alist)
Out[138]:
array([[0.14793297, 0. , 0.13333333, 0.38039908, 0.08571429,
0. , 0. , 0. , 0.06896552, 0. ],
...
[0.14793297, 0. , 0.26666667, 0.50719879, 0.02857143,
0. , 0.10714286, 0. , 0. , 0. ]])
That np.sum(arr1[i] * arr2 * arr3[i], axis=0)
can be written with einsum
as:
In [143]: np.einsum("j,kj,j->j", arr1[i], arr2, arr3[i])
Out[143]:
array([0.14793297, 0. , 0.13333333, 0.38039908, 0.08571429,
0. , 0. , 0. , 0.06896552, 0. ])
and generalized to all rows (i
):
In [144]: np.einsum("ij,kj,ij->ij", arr1, arr2, arr3)
and since we are only summing on k
, that can be written just as simply as
arr1 * arr2.sum(axis=0) * arr3
In other words, just reduce arr2
to a (10,) array, and multiply. There's nothing fancy here.
It may help to note that while arr1
is (m,n), arr1[i]
is (n,). Same for arr3
. With arr2
of shape (k,n), these broadcast
to (1,n), with the result (k,n). Sum on the k
, and you are left with (n,) shape. You aren't summing on the rows or columns of arr1
or arr3
, just on the rows of arr2
.
My iterative list version of your dicts can be written as:
In [150]: alist = []
...: arr2sum = arr2.sum(axis=0)
...: for i in range(arr1.shape[0]):
...: alist.append(arr1[i] * arr2sum * arr3[i])
...: x = np.array(alist)
and your dict
code becomes:
results_dict[idx] = dict(enumerate(np.sum(my_list * arr2sum * my_weights, 1))