I have a large 2d array list of matrices, for example
matrices = np.random.rand(15, 10, 10)
Each of the 15 matrices have 10X10 states (A-J).Each of the matrices are in order and represent time in years in increments of 1. Starting from matrices[0]
which contains the matrix values for year 1, up to matrices[14]
year 15.
The table below shows my an example of my customer data, I have 12000 customers.
customer| current_state | year | amount
ax111 | A | 3 | 300
ax112 | D | 4 | 4890
ax113 | G | 9 | 624
I basically need to match each customers year to the correct matrix and place their amount in their current_state creating a vector for each customer. Example:
ax111 = np.array([300,0,0,0,0,0,0,0,0,0])
(amount 300 placed at state A, 1st element)
ax112 = np([0,0,0,4890,0,0,0,0,0,0])
(amount 4890 placed at state D, 4th element)
I then need to multiply each customers array by the 2d array list matrices
, based on the customers year, and continue multiplying the product by the next matrix until year 15, matrices[14]
is reached for each customer.
The code below works for 1 customer, how can I run it for all 12000 customers.
matrices = np.random.rand(15, 10, 10)
ax111 = np.array([300,0,0,0,0,0,0,0,0,0])
output = ax111
results = []
for arr in matrices[3:14]:
output = output@arr
results.append(output)
The output for the code above will be a (15,10,10) array list. How can I efficiently apply this to 12000 customers?
CodePudding user response:
Since for each customer you perform the dot product of the corresponding array repeatedly with all matrices from i=year
until i=14
, you can precompute these accumulated matrices. I.e. instead of
output = (((ax @ matrices[year]) @ matrices[year 1]) @ ...)
you can do
output = ax @ (matrices[year] @ matrices[year 1]) @ ...)
and precompute the r.h.s.
Then you can perform a "pairwise matrix multiplication" (pairing each customer with the corresponding accumulated matrix) by performing a pairwise multiplication followed by a sum:
import itertools as it
import numpy as np
# --- Example data ---
rng = np.random.default_rng()
matrices = rng.integers(0, 100, size=(15,10,10)) # using integers for exact results
customers = np.array([
[300,0,0,0,0,0,0,0,0,0],
[0,0,0,4890,0,0,0,0,0,0],
[0,0,0,0,0,0,624,0,0,0],
])
years = [3, 4, 9]
# --- Reference computation ---
results = []
for c, y in zip(customers, years):
for m in matrices[y:]:
c = c @ m
results.append(c)
results = np.stack(results)
# --- Vectorized approach ---
matrices = np.stack([*it.accumulate(matrices[::-1], lambda x,y: y@x)][::-1])
new = (customers[:,:,None] * matrices[years]).sum(axis=1)
assert np.array_equal(new, results)
CodePudding user response:
**not an answer
@a_guest Im also only getting the 1st array for each customer, when I run your code, with a size of (3,10,10).When I run the for loop, with the same matrices data for ax111 I get all the arrays for ax111, with a size(11,10,10).
rng = np.random.default_rng()
matrices = rng.integers(0, 10, size=(15,10,10)) # using integers for exact results
ax111 = np.array([300,0,0,0,0,0,0,0,0,0])
newoutput = ax111
newresults = []
for arr in matrices[3:14]:
newoutput = newoutput@arr
newresults.append(newoutput)
When I run the above code I get all the arrays for ax111, with size(11,10,10).
customers = np.array([
[300,0,0,0,0,0,0,0,0,0],
[0,0,0,4890,0,0,0,0,0,0],
[0,0,0,0,0,0,624,0,0,0],
])
years = [3, 4, 9]
# --- Reference computation ---
results = []
for c, y in zip(customers, years):
for m in matrices[y:]:
c = c @ m
results.append(c)
results = np.stack(results)
# --- Vectorized approach ---
matrices = np.stack([*it.accumulate(matrices[::-1], lambda x,y: y@x)][::-1])
new = (customers[:,:,None] * matrices[years]).sum(axis=1)
assert np.array_equal(new, results)
When I run the above code I get an array of (3,10,10).When I compare results with newresults. I see that the above code answer only gives each customers first array. Let me know if Im doing something wrong