I have a numpy
array (i.e., x
) where the lost column in each row represents the index number.
import numpy as np
import random
np.random.seed(0)
x = np.random.random([5,3])
x = np.append(x, np.arange(x.shape[0]).reshape(-1,1), axis=1)
x=
array([[0.5488135 , 0.71518937, 0.60276338, 0. ],
[0.54488318, 0.4236548 , 0.64589411, 1. ],
[0.43758721, 0.891773 , 0.96366276, 2. ],
[0.38344152, 0.79172504, 0.52889492, 3. ],
[0.56804456, 0.92559664, 0.07103606, 4. ]])
I have another numpy
array called y
which is related to the first array in a way that each row in x
has a user-defined value related rows in y
.
rep = 4
y = np.random.random([rep*5,3])
array([[0.0871293 , 0.0202184 , 0.83261985],
[0.77815675, 0.87001215, 0.97861834],
[0.79915856, 0.46147936, 0.78052918],
[0.11827443, 0.63992102, 0.14335329],
[0.94466892, 0.52184832, 0.41466194],
[0.26455561, 0.77423369, 0.45615033],
[0.56843395, 0.0187898 , 0.6176355 ],
[0.61209572, 0.616934 , 0.94374808],
[0.6818203 , 0.3595079 , 0.43703195],
[0.6976312 , 0.06022547, 0.66676672],
[0.67063787, 0.21038256, 0.1289263 ],
[0.31542835, 0.36371077, 0.57019677],
[0.43860151, 0.98837384, 0.10204481],
[0.20887676, 0.16130952, 0.65310833],
[0.2532916 , 0.46631077, 0.24442559],
[0.15896958, 0.11037514, 0.65632959],
[0.13818295, 0.19658236, 0.36872517],
[0.82099323, 0.09710128, 0.83794491],
[0.09609841, 0.97645947, 0.4686512 ],
[0.97676109, 0.60484552, 0.73926358]])
For example, index 0 in x
is related to indices 0,1,2,3 in y
.
Suppose after calling a method, I get an index set from the last column of array x
.
ind = my_method(x) #Note that it can be any permutation of number 0 to n-1 where n is the number of rows in x
ind
[4, 0] #For the sake of simplicity, let us assume that the method returns [4,0]
I was wondering what is the most efficient way to access the rows of y
with a given set of indices (e.g., when having millions of rows). For instance, if I have ind = [4,0]
, then I'd like to get the rows 12,13,14,15,0,1,2,3
in y
.
Expected output:
[[0.13818295, 0.19658236, 0.36872517],
[0.82099323, 0.09710128, 0.83794491],
[0.09609841, 0.97645947, 0.4686512 ],
[0.97676109, 0.60484552, 0.73926358],
[0.0871293 , 0.0202184 , 0.83261985],
[0.77815675, 0.87001215, 0.97861834],
[0.79915856, 0.46147936, 0.78052918],
[0.11827443, 0.63992102, 0.14335329]]
CodePudding user response:
import numpy as np
import random
np.random.seed(0)
n,m = 10, 20
x = np.random.random([n,m])
x = np.append(x, np.arange(x.shape[0]).reshape(-1,1), axis=1)
rep = 3
y = np.random.random([rep*n,m])
ind = np.array([0, 2 , 1])
The chosen ind
implies that you need the rows among the first nine rows.
y[:9,]
array([[0.31179588, 0.69634349, 0.37775184, 0.17960368, 0.02467873,
0.06724963, 0.67939277, 0.45369684, 0.53657921, 0.89667129,
0.99033895, 0.21689698, 0.6630782 , 0.26332238, 0.020651 ,
0.75837865, 0.32001715, 0.38346389, 0.58831711, 0.83104846],
[0.62898184, 0.87265066, 0.27354203, 0.79804683, 0.18563594,
0.95279166, 0.68748828, 0.21550768, 0.94737059, 0.73085581,
0.25394164, 0.21331198, 0.51820071, 0.02566272, 0.20747008,
0.42468547, 0.37416998, 0.46357542, 0.27762871, 0.58678435],
[0.86385561, 0.11753186, 0.51737911, 0.13206811, 0.71685968,
0.3960597 , 0.56542131, 0.18327984, 0.14484776, 0.48805628,
0.35561274, 0.94043195, 0.76532525, 0.74866362, 0.90371974,
0.08342244, 0.55219247, 0.58447607, 0.96193638, 0.29214753],
[0.24082878, 0.10029394, 0.01642963, 0.92952932, 0.66991655,
0.78515291, 0.28173011, 0.58641017, 0.06395527, 0.4856276 ,
0.97749514, 0.87650525, 0.33815895, 0.96157015, 0.23170163,
0.94931882, 0.9413777 , 0.79920259, 0.63044794, 0.87428797],
[0.29302028, 0.84894356, 0.61787669, 0.01323686, 0.34723352,
0.14814086, 0.98182939, 0.47837031, 0.49739137, 0.63947252,
0.36858461, 0.13690027, 0.82211773, 0.18984791, 0.51131898,
0.22431703, 0.09784448, 0.86219152, 0.97291949, 0.96083466],
[0.9065555 , 0.77404733, 0.33314515, 0.08110139, 0.40724117,
0.23223414, 0.13248763, 0.05342718, 0.72559436, 0.01142746,
0.77058075, 0.14694665, 0.07952208, 0.08960303, 0.67204781,
0.24536721, 0.42053947, 0.55736879, 0.86055117, 0.72704426],
[0.27032791, 0.1314828 , 0.05537432, 0.30159863, 0.26211815,
0.45614057, 0.68328134, 0.69562545, 0.28351885, 0.37992696,
0.18115096, 0.78854551, 0.05684808, 0.69699724, 0.7786954 ,
0.77740756, 0.25942256, 0.37381314, 0.58759964, 0.2728219 ],
[0.3708528 , 0.19705428, 0.45985588, 0.0446123 , 0.79979588,
0.07695645, 0.51883515, 0.3068101 , 0.57754295, 0.95943334,
0.64557024, 0.03536244, 0.43040244, 0.51001685, 0.53617749,
0.68139251, 0.2775961 , 0.12886057, 0.39267568, 0.95640572],
[0.18713089, 0.90398395, 0.54380595, 0.45691142, 0.88204141,
0.45860396, 0.72416764, 0.39902532, 0.90404439, 0.69002502,
0.69962205, 0.3277204 , 0.75677864, 0.63606106, 0.24002027,
0.16053882, 0.79639147, 0.9591666 , 0.45813883, 0.59098417]])
The exact indices that you need can be obtained with ind[:,np.newaxis]*rep range(rep)
which outputs
array([[0, 1, 2],
[6, 7, 8],
[3, 4, 5]])
Finally, you can get the indices you need in a proper form with the following command.
y[ ind[:,None]*rep range(rep), :].reshape(-1,m)
array([[0.31179588, 0.69634349, 0.37775184, 0.17960368, 0.02467873,
0.06724963, 0.67939277, 0.45369684, 0.53657921, 0.89667129,
0.99033895, 0.21689698, 0.6630782 , 0.26332238, 0.020651 ,
0.75837865, 0.32001715, 0.38346389, 0.58831711, 0.83104846],
[0.62898184, 0.87265066, 0.27354203, 0.79804683, 0.18563594,
0.95279166, 0.68748828, 0.21550768, 0.94737059, 0.73085581,
0.25394164, 0.21331198, 0.51820071, 0.02566272, 0.20747008,
0.42468547, 0.37416998, 0.46357542, 0.27762871, 0.58678435],
[0.86385561, 0.11753186, 0.51737911, 0.13206811, 0.71685968,
0.3960597 , 0.56542131, 0.18327984, 0.14484776, 0.48805628,
0.35561274, 0.94043195, 0.76532525, 0.74866362, 0.90371974,
0.08342244, 0.55219247, 0.58447607, 0.96193638, 0.29214753],
[0.27032791, 0.1314828 , 0.05537432, 0.30159863, 0.26211815,
0.45614057, 0.68328134, 0.69562545, 0.28351885, 0.37992696,
0.18115096, 0.78854551, 0.05684808, 0.69699724, 0.7786954 ,
0.77740756, 0.25942256, 0.37381314, 0.58759964, 0.2728219 ],
[0.3708528 , 0.19705428, 0.45985588, 0.0446123 , 0.79979588,
0.07695645, 0.51883515, 0.3068101 , 0.57754295, 0.95943334,
0.64557024, 0.03536244, 0.43040244, 0.51001685, 0.53617749,
0.68139251, 0.2775961 , 0.12886057, 0.39267568, 0.95640572],
[0.18713089, 0.90398395, 0.54380595, 0.45691142, 0.88204141,
0.45860396, 0.72416764, 0.39902532, 0.90404439, 0.69002502,
0.69962205, 0.3277204 , 0.75677864, 0.63606106, 0.24002027,
0.16053882, 0.79639147, 0.9591666 , 0.45813883, 0.59098417],
[0.24082878, 0.10029394, 0.01642963, 0.92952932, 0.66991655,
0.78515291, 0.28173011, 0.58641017, 0.06395527, 0.4856276 ,
0.97749514, 0.87650525, 0.33815895, 0.96157015, 0.23170163,
0.94931882, 0.9413777 , 0.79920259, 0.63044794, 0.87428797],
[0.29302028, 0.84894356, 0.61787669, 0.01323686, 0.34723352,
0.14814086, 0.98182939, 0.47837031, 0.49739137, 0.63947252,
0.36858461, 0.13690027, 0.82211773, 0.18984791, 0.51131898,
0.22431703, 0.09784448, 0.86219152, 0.97291949, 0.96083466],
[0.9065555 , 0.77404733, 0.33314515, 0.08110139, 0.40724117,
0.23223414, 0.13248763, 0.05342718, 0.72559436, 0.01142746,
0.77058075, 0.14694665, 0.07952208, 0.08960303, 0.67204781,
0.24536721, 0.42053947, 0.55736879, 0.86055117, 0.72704426]])
Hope this helps. I tried to make my answer more generalized. You can modify it based on your need.
CodePudding user response:
I think you need something like:
indx = np.array(indx)
rows_in_y = indx[:,np.newaxis]*3 range(4)
y[rows_in_y,:]
I'm not sure what you are trying to achieve but it seems to be a fairly normal indexing problem.