Home > other >  How to do efficient filtering with NumPy
How to do efficient filtering with NumPy

Time:12-06

I've been trying to generate all the possible combinations between arrays, let's say a, b, c, x, y, z where the last 3 (x, y, z) can be arrays OR floats. Thanks to useful comments and answers the task is accomplished (BTW, in a more general way, accepting arrays and floats) by

from typing import Union, Sequence

import numpy as np
from numbers import Real


def cartesian_product(*arrays: np.ndarray) -> np.ndarray:
    """
    See
    https://stackoverflow.com/questions/11144513/cartesian-product-of-x-and-y-array-points-into-single-array-of-2d-points
    """
    la = len(arrays)
    dtype = np.result_type(*arrays)
    arr = np.empty([len(a) for a in arrays]   [la], dtype=dtype)
    for i, a in enumerate(np.ix_(*arrays)):
        arr[..., i] = a
    return arr.reshape(-1, la)


def iter_func(
    *args: Union[Real, Sequence[Real], np.ndarray],
) -> np.ndarray:
    return cartesian_product(*(
        np.atleast_1d(a) for a in args
    ))

running iter_func(5,[2,3],3,[3,6,9],2,[1,2,4]) results in

array([[5, 2, 3, 3, 2, 1],
       [5, 2, 3, 3, 2, 2],
       [5, 2, 3, 3, 2, 4],
       [5, 2, 3, 6, 2, 1],
       [5, 2, 3, 6, 2, 2],
       [5, 2, 3, 6, 2, 4],
       [5, 2, 3, 9, 2, 1],
       [5, 2, 3, 9, 2, 2],
       [5, 2, 3, 9, 2, 4],
       [5, 3, 3, 3, 2, 1],
       [5, 3, 3, 3, 2, 2],
       [5, 3, 3, 3, 2, 4],
       [5, 3, 3, 6, 2, 1],
       [5, 3, 3, 6, 2, 2],
       [5, 3, 3, 6, 2, 4],
       [5, 3, 3, 9, 2, 1],
       [5, 3, 3, 9, 2, 2],
       [5, 3, 3, 9, 2, 4]])

I'm interested in evaluating each list (appending the results) inside the array and filter if the results are not useful. I know this can be done by

#Defining 2 generic operations to provide an example

def Operation(args):
    args=args.tolist()
    a,b,c,*args = args
    return (a b c)/sum(args)

def Operation2(args):
    args=args.tolist()
    a,b,c,*args = args
    return (a/b/c)

#Using a list comprehension to calculate, check if the combination satisfies the requirement and append (if)

new_list = [np.append(element,[Operation(element),Operation2(element)]) for element in iter_func(5,[2,3],3,[3,6,9],2,[1,2,4]) if 0.7<Operation(element)<1.2 and 0.55<Operation2(element)<0.85]

#Printing the result
for i in new_list:
    print(i)

This results in:

[5.     2.     3.     3.     2.     4.     1.1111 0.8333]
[5.     2.     3.     6.     2.     1.     1.1111 0.8333]
[5.     2.     3.     6.     2.     2.     1.     0.8333]
[5.     2.     3.     6.     2.     4.     0.8333 0.8333]
[5.     2.     3.     9.     2.     1.     0.8333 0.8333]
[5.     2.     3.     9.     2.     2.     0.7692 0.8333]
[5.     3.     3.     6.     2.     2.     1.1    0.5556]
[5.     3.     3.     6.     2.     4.     0.9167 0.5556]
[5.     3.     3.     9.     2.     1.     0.9167 0.5556]
[5.     3.     3.     9.     2.     2.     0.8462 0.5556]
[5.     3.     3.     9.     2.     4.     0.7333 0.5556]

which works as a filter. The Question is: how could I do it directly evaluating the operations and conditions when a combination is generated?

This way I would be appending less elements and not iterating through every set once the combinations are calculated, which, I imagine, is better.

CodePudding user response:

Multiple types are cumbersome, especially if you store them in the same list. You could simply do:

data = ([5],[2,3],[3],[3,6,9],[2],[1,2,4])
plug_in = np.stack(np.meshgrid(*data), axis=-1).reshape(-1, len(data)) #Cartesian product

op1 = np.sum(plug_in[:,:3], axis=1)/np.sum(plug_in[:,3:], axis=1) #results of your operation1
op2 = plug_in[:,0]/plug_in[:,1]/plug_in[:,2] #results of your operation2

plug_in = np.column_stack([plug_in, op1[:,None], op2[:,None]]) #append two extra columns
plug_in = plug_in[(0.7 < op1) & (op1 < 1.2) & (0.55 < op2) & (op2 < 0.85)] #drop rows you don't need
with np.printoptions(precision=4): #print it nice!
    print(plug_in)

[[5.     2.     3.     3.     2.     4.     1.1111 0.8333]
 [5.     2.     3.     6.     2.     1.     1.1111 0.8333]
 [5.     2.     3.     6.     2.     2.     1.     0.8333]
 [5.     2.     3.     6.     2.     4.     0.8333 0.8333]
 [5.     2.     3.     9.     2.     1.     0.8333 0.8333]
 [5.     2.     3.     9.     2.     2.     0.7692 0.8333]
 [5.     3.     3.     6.     2.     2.     1.1    0.5556]
 [5.     3.     3.     6.     2.     4.     0.9167 0.5556]
 [5.     3.     3.     9.     2.     1.     0.9167 0.5556]
 [5.     3.     3.     9.     2.     2.     0.8462 0.5556]
 [5.     3.     3.     9.     2.     4.     0.7333 0.5556]]
  • Related