Home > Software engineering >  Accessing the sequential rows of a numpy array with a given set of indices
Accessing the sequential rows of a numpy array with a given set of indices

Time:12-23

I have a numpy array (i.e., x ) where the lost column in each row represents the index number.

import numpy as np
import random 
np.random.seed(0)
x = np.random.random([5,3])
x = np.append(x, np.arange(x.shape[0]).reshape(-1,1), axis=1) 
x=
array([[0.5488135 , 0.71518937, 0.60276338, 0.        ],
       [0.54488318, 0.4236548 , 0.64589411, 1.        ],
       [0.43758721, 0.891773  , 0.96366276, 2.        ],
       [0.38344152, 0.79172504, 0.52889492, 3.        ],
       [0.56804456, 0.92559664, 0.07103606, 4.        ]])

I have another numpy array called y which is related to the first array in a way that each row in x has a user-defined value related rows in y.

rep = 4
y = np.random.random([rep*5,3])
array([[0.0871293 , 0.0202184 , 0.83261985],
       [0.77815675, 0.87001215, 0.97861834],
       [0.79915856, 0.46147936, 0.78052918],
       [0.11827443, 0.63992102, 0.14335329],
       [0.94466892, 0.52184832, 0.41466194],
       [0.26455561, 0.77423369, 0.45615033],
       [0.56843395, 0.0187898 , 0.6176355 ],
       [0.61209572, 0.616934  , 0.94374808],
       [0.6818203 , 0.3595079 , 0.43703195],
       [0.6976312 , 0.06022547, 0.66676672],
       [0.67063787, 0.21038256, 0.1289263 ],
       [0.31542835, 0.36371077, 0.57019677],
       [0.43860151, 0.98837384, 0.10204481],
       [0.20887676, 0.16130952, 0.65310833],
       [0.2532916 , 0.46631077, 0.24442559],
       [0.15896958, 0.11037514, 0.65632959],
       [0.13818295, 0.19658236, 0.36872517],
       [0.82099323, 0.09710128, 0.83794491],
       [0.09609841, 0.97645947, 0.4686512 ],
       [0.97676109, 0.60484552, 0.73926358]])

For example, index 0 in x is related to indices 0,1,2,3 in y.

Suppose after calling a method, I get an index set from the last column of array x.

ind = my_method(x) #Note that it can be any permutation of number 0 to n-1 where n is the number of rows in x
ind
[4, 0] #For the sake of simplicity, let us assume that the method returns [4,0]

I was wondering what is the most efficient way to access the rows of y with a given set of indices (e.g., when having millions of rows). For instance, if I have ind = [4,0], then I'd like to get the rows 12,13,14,15,0,1,2,3 in y.

Expected output:

       [[0.13818295, 0.19658236, 0.36872517],
       [0.82099323, 0.09710128, 0.83794491],
       [0.09609841, 0.97645947, 0.4686512 ],
       [0.97676109, 0.60484552, 0.73926358],
       [0.0871293 , 0.0202184 , 0.83261985],
       [0.77815675, 0.87001215, 0.97861834],
       [0.79915856, 0.46147936, 0.78052918],
       [0.11827443, 0.63992102, 0.14335329]]

CodePudding user response:

import numpy as np
import random 

np.random.seed(0)

n,m = 10, 20

x = np.random.random([n,m])
x = np.append(x, np.arange(x.shape[0]).reshape(-1,1), axis=1) 

rep = 3

y = np.random.random([rep*n,m])

ind = np.array([0, 2 , 1]) 

The chosen ind implies that you need the rows among the first nine rows.

y[:9,]
​
array([[0.31179588, 0.69634349, 0.37775184, 0.17960368, 0.02467873,
        0.06724963, 0.67939277, 0.45369684, 0.53657921, 0.89667129,
        0.99033895, 0.21689698, 0.6630782 , 0.26332238, 0.020651  ,
        0.75837865, 0.32001715, 0.38346389, 0.58831711, 0.83104846],
       [0.62898184, 0.87265066, 0.27354203, 0.79804683, 0.18563594,
        0.95279166, 0.68748828, 0.21550768, 0.94737059, 0.73085581,
        0.25394164, 0.21331198, 0.51820071, 0.02566272, 0.20747008,
        0.42468547, 0.37416998, 0.46357542, 0.27762871, 0.58678435],
       [0.86385561, 0.11753186, 0.51737911, 0.13206811, 0.71685968,
        0.3960597 , 0.56542131, 0.18327984, 0.14484776, 0.48805628,
        0.35561274, 0.94043195, 0.76532525, 0.74866362, 0.90371974,
        0.08342244, 0.55219247, 0.58447607, 0.96193638, 0.29214753],
       [0.24082878, 0.10029394, 0.01642963, 0.92952932, 0.66991655,
        0.78515291, 0.28173011, 0.58641017, 0.06395527, 0.4856276 ,
        0.97749514, 0.87650525, 0.33815895, 0.96157015, 0.23170163,
        0.94931882, 0.9413777 , 0.79920259, 0.63044794, 0.87428797],
       [0.29302028, 0.84894356, 0.61787669, 0.01323686, 0.34723352,
        0.14814086, 0.98182939, 0.47837031, 0.49739137, 0.63947252,
        0.36858461, 0.13690027, 0.82211773, 0.18984791, 0.51131898,
        0.22431703, 0.09784448, 0.86219152, 0.97291949, 0.96083466],
       [0.9065555 , 0.77404733, 0.33314515, 0.08110139, 0.40724117,
        0.23223414, 0.13248763, 0.05342718, 0.72559436, 0.01142746,
        0.77058075, 0.14694665, 0.07952208, 0.08960303, 0.67204781,
        0.24536721, 0.42053947, 0.55736879, 0.86055117, 0.72704426],
       [0.27032791, 0.1314828 , 0.05537432, 0.30159863, 0.26211815,
        0.45614057, 0.68328134, 0.69562545, 0.28351885, 0.37992696,
        0.18115096, 0.78854551, 0.05684808, 0.69699724, 0.7786954 ,
        0.77740756, 0.25942256, 0.37381314, 0.58759964, 0.2728219 ],
       [0.3708528 , 0.19705428, 0.45985588, 0.0446123 , 0.79979588,
        0.07695645, 0.51883515, 0.3068101 , 0.57754295, 0.95943334,
        0.64557024, 0.03536244, 0.43040244, 0.51001685, 0.53617749,
        0.68139251, 0.2775961 , 0.12886057, 0.39267568, 0.95640572],
       [0.18713089, 0.90398395, 0.54380595, 0.45691142, 0.88204141,
        0.45860396, 0.72416764, 0.39902532, 0.90404439, 0.69002502,
        0.69962205, 0.3277204 , 0.75677864, 0.63606106, 0.24002027,
        0.16053882, 0.79639147, 0.9591666 , 0.45813883, 0.59098417]])

The exact indices that you need can be obtained with ind[:,np.newaxis]*rep range(rep) which outputs

array([[0, 1, 2],
       [6, 7, 8],
       [3, 4, 5]])

Finally, you can get the indices you need in a proper form with the following command.

y[ ind[:,None]*rep    range(rep), :].reshape(-1,m)


array([[0.31179588, 0.69634349, 0.37775184, 0.17960368, 0.02467873,
        0.06724963, 0.67939277, 0.45369684, 0.53657921, 0.89667129,
        0.99033895, 0.21689698, 0.6630782 , 0.26332238, 0.020651  ,
        0.75837865, 0.32001715, 0.38346389, 0.58831711, 0.83104846],
       [0.62898184, 0.87265066, 0.27354203, 0.79804683, 0.18563594,
        0.95279166, 0.68748828, 0.21550768, 0.94737059, 0.73085581,
        0.25394164, 0.21331198, 0.51820071, 0.02566272, 0.20747008,
        0.42468547, 0.37416998, 0.46357542, 0.27762871, 0.58678435],
       [0.86385561, 0.11753186, 0.51737911, 0.13206811, 0.71685968,
        0.3960597 , 0.56542131, 0.18327984, 0.14484776, 0.48805628,
        0.35561274, 0.94043195, 0.76532525, 0.74866362, 0.90371974,
        0.08342244, 0.55219247, 0.58447607, 0.96193638, 0.29214753],
       [0.27032791, 0.1314828 , 0.05537432, 0.30159863, 0.26211815,
        0.45614057, 0.68328134, 0.69562545, 0.28351885, 0.37992696,
        0.18115096, 0.78854551, 0.05684808, 0.69699724, 0.7786954 ,
        0.77740756, 0.25942256, 0.37381314, 0.58759964, 0.2728219 ],
       [0.3708528 , 0.19705428, 0.45985588, 0.0446123 , 0.79979588,
        0.07695645, 0.51883515, 0.3068101 , 0.57754295, 0.95943334,
        0.64557024, 0.03536244, 0.43040244, 0.51001685, 0.53617749,
        0.68139251, 0.2775961 , 0.12886057, 0.39267568, 0.95640572],
       [0.18713089, 0.90398395, 0.54380595, 0.45691142, 0.88204141,
        0.45860396, 0.72416764, 0.39902532, 0.90404439, 0.69002502,
        0.69962205, 0.3277204 , 0.75677864, 0.63606106, 0.24002027,
        0.16053882, 0.79639147, 0.9591666 , 0.45813883, 0.59098417],
       [0.24082878, 0.10029394, 0.01642963, 0.92952932, 0.66991655,
        0.78515291, 0.28173011, 0.58641017, 0.06395527, 0.4856276 ,
        0.97749514, 0.87650525, 0.33815895, 0.96157015, 0.23170163,
        0.94931882, 0.9413777 , 0.79920259, 0.63044794, 0.87428797],
       [0.29302028, 0.84894356, 0.61787669, 0.01323686, 0.34723352,
        0.14814086, 0.98182939, 0.47837031, 0.49739137, 0.63947252,
        0.36858461, 0.13690027, 0.82211773, 0.18984791, 0.51131898,
        0.22431703, 0.09784448, 0.86219152, 0.97291949, 0.96083466],
       [0.9065555 , 0.77404733, 0.33314515, 0.08110139, 0.40724117,
        0.23223414, 0.13248763, 0.05342718, 0.72559436, 0.01142746,
        0.77058075, 0.14694665, 0.07952208, 0.08960303, 0.67204781,
        0.24536721, 0.42053947, 0.55736879, 0.86055117, 0.72704426]])

Hope this helps. I tried to make my answer more generalized. You can modify it based on your need.

CodePudding user response:

I think you need something like:

indx = np.array(indx)
rows_in_y = indx[:,np.newaxis]*3   range(4)
y[rows_in_y,:]

I'm not sure what you are trying to achieve but it seems to be a fairly normal indexing problem.

  • Related