Home > Mobile >  Select rows of numpy array based on column values
Select rows of numpy array based on column values

Time:09-16

I'm trying to obtain the rows of a numpy array based on the values of the columns. Basically, if the value of the column is within a predefined list, I want to obtain that row. I'll leave an example below.

This is my array:

myArray = np.array([[1,55,4],
                     [2,2,3],
                     [3,90,2],
                     [4,65,1]])

These are the desired values:

desiredValues = [2,3,4]

And I want to obtain all the rows of the array (myArray) for which the value of the first column is in the list (desiredValues). Obtaining the following array:

desiredArray([[2,2,3],
              [3,90,2],
              [4,65,1]])

I've done some research and for specific values (enter image description here

The above graph shows that the computation time for the for-loop seems to increase linearly with the growth of the tested array. The for-loop is faster until the number of rows is approximately 10.

Comparisons for larger test arrays

The second graph shows a comparison similar to the first graph, but examines computation times when the number of rows in my_array is 2, 4, 8, 16, ... 1024.

enter image description here The above graph shows that np.isin() is significantly faster and more appropriate for larger problems.

Code for reproduction

The data to recreate the above graphs may be generated with the following code.

import numpy as np

count_list = [2**x for x in range(2,11)]
isin_time_means = []
loop_time_means = []

for count in count_list:
  my_array = np.random.randint(low=-10,high=10,size=(count,5))
  desired_values = np.random.randint(low=-10,high = 10,size=(10,))
  a = %timeit -o np.isin(my_array[:,0],desired_values)
  b = %timeit -o [x in desired_values for x in my_array[:,0]]
  isin_time_means.append(np.mean(a.timings))
  loop_time_means.append(np.mean(b.timings))

CodePudding user response:

You can always use for-loop to check every value separatelly

mask = [x in desiredValues for x in myArray[:,0]]

desired_array = myArray[mask]

Full code:

import numpy as np

myArray = np.array([[1,55,4],
                     [2,2,3],
                     [3,90,2],
                     [4,65,1]])

desiredValues = [2,3,4]

mask = [x in desiredValues for x in myArray[:,0]]

desired_array = myArray[mask]

print(desired_array)
  • Related