I have an array (dataframe) with shape 9800, 9800
. I need to index it (without labels) like:
x = (9800,9800)
a = x[0:7000,0:7000] (plus) x[7201:9800, 0:7000] (plus) x[0:7000, 7201:9800] (plus) x[7201:9800, 7201:9800]
b = x[7000:7200, 7000:7200]
c = x[7000:7200, 0:7000] (plus) x[7000:7200, 7201:9800]
d = x[0:7000, 7000:7200] (plus) x[7201:9800, 7000:7200]
What I mean by plus, is not a proper addition but more like a concatenation. Like putting the resulting dataframes together one next to the other. See attached image.
Is there any "easy" way of doing this? I need to replicate this to 10,000 dataframes and add them up individually to save memory.
CodePudding user response:
You have np.r_
, which basically creates an index array for you, for example:
np.r_[:3,4:6]
gives
array([0, 1, 2, 4, 5])
So in your case:
a_idx = np.r_[0:7000,7200:9000]
a = x[a_idx, a_idx]
c = x[7000:7200, a_idx]
CodePudding user response:
In [167]: x=np.zeros((9800,9800),'int8')
The first list of slices:
In [168]: a = [x[0:7000,0:7000], x[7201:9800, 0:7000],x[0:7000, 7201:9800], x[7201:9800, 7201:9800]]
and their shapes:
In [169]: [i.shape for i in a]
Out[169]: [(7000, 7000), (2599, 7000), (7000, 2599), (2599, 2599)]
Since the shapes vary, you can't simply concatenate
them all:
In [170]: np.concatenate(a, axis=0)
Traceback (most recent call last):
File "<ipython-input-170-c111dc665509>", line 1, in <module>
np.concatenate(a, axis=0)
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 7000 and the array at index 2 has size 2599
In [171]: np.concatenate(a, axis=1)
Traceback (most recent call last):
File "<ipython-input-171-227af3749524>", line 1, in <module>
np.concatenate(a, axis=1)
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 7000 and the array at index 1 has size 2599
You can concatenate subsets:
In [172]: np.concatenate(a[:2], axis=0)
Out[172]:
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int8)
In [173]: _.shape
Out[173]: (9599, 7000)
I won't take the time to construct the other lists, but it looks like you could construct the first column with
np.concatenate([a[0], c[0], a[1]], axis=0)
similarly for the other columns, and then concatenate columns. Or join them by rows first.
np.block([[a[0],d[0],a[2]],[....]])
with an appropriate mix of list elements should do the same (just a difference in notation, same concatenation work).