Home > Software engineering >  Complex index numpy array or indexing dataframe
Complex index numpy array or indexing dataframe

Time:10-13

I have an array (dataframe) with shape 9800, 9800. I need to index it (without labels) like:

x = (9800,9800)    

a = x[0:7000,0:7000] (plus) x[7201:9800, 0:7000] (plus) x[0:7000, 7201:9800] (plus) x[7201:9800, 7201:9800]
b = x[7000:7200, 7000:7200]
c = x[7000:7200, 0:7000] (plus) x[7000:7200, 7201:9800]
d = x[0:7000, 7000:7200] (plus) x[7201:9800, 7000:7200]

What I mean by plus, is not a proper addition but more like a concatenation. Like putting the resulting dataframes together one next to the other. See attached image. indexing

Is there any "easy" way of doing this? I need to replicate this to 10,000 dataframes and add them up individually to save memory.

CodePudding user response:

You have np.r_, which basically creates an index array for you, for example:

np.r_[:3,4:6]

gives

array([0, 1, 2, 4, 5])

So in your case:

a_idx = np.r_[0:7000,7200:9000]

a = x[a_idx, a_idx]
c = x[7000:7200, a_idx]

CodePudding user response:

In [167]: x=np.zeros((9800,9800),'int8')

The first list of slices:

In [168]: a = [x[0:7000,0:7000], x[7201:9800, 0:7000],x[0:7000, 7201:9800], x[7201:9800, 7201:9800]]

and their shapes:

In [169]: [i.shape for i in a]
Out[169]: [(7000, 7000), (2599, 7000), (7000, 2599), (2599, 2599)]

Since the shapes vary, you can't simply concatenate them all:

In [170]: np.concatenate(a, axis=0)
Traceback (most recent call last):
  File "<ipython-input-170-c111dc665509>", line 1, in <module>
    np.concatenate(a, axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 7000 and the array at index 2 has size 2599

In [171]: np.concatenate(a, axis=1)
Traceback (most recent call last):
  File "<ipython-input-171-227af3749524>", line 1, in <module>
    np.concatenate(a, axis=1)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 7000 and the array at index 1 has size 2599

You can concatenate subsets:

In [172]: np.concatenate(a[:2], axis=0)
Out[172]: 
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int8)
In [173]: _.shape
Out[173]: (9599, 7000)

I won't take the time to construct the other lists, but it looks like you could construct the first column with

np.concatenate([a[0], c[0], a[1]], axis=0)

similarly for the other columns, and then concatenate columns. Or join them by rows first.

np.block([[a[0],d[0],a[2]],[....]]) with an appropriate mix of list elements should do the same (just a difference in notation, same concatenation work).

  • Related