I have 16 2d-arrays, each in a shape of [16000, 16000], which means one array has 256000000 cells. I want to have a std_array that is the standard deviation of each cell in the 16 arrays. I tried something but failed, and my questions are in bold.
Here's my attempt. For example (simplified 3*3 arrays):
a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([[2,3,4],[2,3,4],[2,3,4]])
c = np.array([[3,4,5],[3,4,5],[3,4,5]])
stack = np.vstack((a,b,c))
var = np.std(stack, axis = 0)
However, the np.std function only returns 3 values, but I want 9. What should I do?
[0.81649658 0.81649658 0.81649658]
In addition, when I apply std on the stacked-arrays, I get this error. Does it simply mean that my arrays are too large to operate?
MemoryError: Unable to allocate array with shape (256000, 16000) and data type float32
CodePudding user response:
In your example, np.vstack((a,b,c))
just stack all lines of each array resulting in this one:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[2, 3, 4],
[2, 3, 4],
[2, 3, 4],
[3, 4, 5],
[3, 4, 5],
[3, 4, 5]])
Computing the standard deviation along the axis 0 or 1 does not meet your requirements.
Instead, you can add a new dimension to each array so to stack them in a new dimension:
stack = np.vstack([a[None], b[None], c[None]])
stack.std(axis=2)
In this case stack
is:
array([[[1, 2, 3], <-- array `a`
[1, 2, 3],
[1, 2, 3]],
[[2, 3, 4], <-- array `b`
[2, 3, 4],
[2, 3, 4]],
[[3, 4, 5], <-- array `c`
[3, 4, 5],
[3, 4, 5]]])
The result is a 2D array of shape (3,3)
where the standard deviation is computed based on the 3 values coming from respectively each of the 3 arrays.
The thing is building a huge array so to reduce it later is not memory efficient. You can instead iterate over the lines so to build smaller arrays:
result = np.empty(a.shape, dtype=np.float64)
for i in range(a.shape[0]):
stacked_line = np.vstack([a[i, None], b[i, None], c[i, None]])
result[i,:] = stacked_line.std(axis=0)
For higher performance, you can use Numba so to avoid the creation of many big arrays (mandatory with Numpy) that are expensive to build and fill.