Home > Mobile >  How to do efficient filtering with NumPy while calculating combinations of lists-of-scalars and scal
How to do efficient filtering with NumPy while calculating combinations of lists-of-scalars and scal

Time:12-06

I've been trying to generate all the possible combinations between arrays, let's say a, b, c, x, y, z where the last 3 (x, y, z) can be arrays OR floats. Thanks to useful comments and answers the task is accomplished (BTW, in a more general way, accepting arrays and floats) by

from typing import Union, Sequence

import numpy as np
from numbers import Real


def cartesian_product(*arrays: np.ndarray) -> np.ndarray:
    """
    See
    https://stackoverflow.com/questions/11144513/cartesian-product-of-x-and-y-array-points-into-single-array-of-2d-points
    """
    la = len(arrays)
    dtype = np.result_type(*arrays)
    arr = np.empty([len(a) for a in arrays]   [la], dtype=dtype)
    for i, a in enumerate(np.ix_(*arrays)):
        arr[..., i] = a
    return arr.reshape(-1, la)


def iter_func(
    *args: Union[Real, Sequence[Real], np.ndarray],
) -> np.ndarray:
    return cartesian_product(*(
        np.atleast_1d(a) for a in args
    ))

running iter_func(5,[2,3],3,[3,6,9],2,[1,2,4]) results in

array([[5, 2, 3, 3, 2, 1],
       [5, 2, 3, 3, 2, 2],
       [5, 2, 3, 3, 2, 4],
       [5, 2, 3, 6, 2, 1],
       [5, 2, 3, 6, 2, 2],
       [5, 2, 3, 6, 2, 4],
       [5, 2, 3, 9, 2, 1],
       [5, 2, 3, 9, 2, 2],
       [5, 2, 3, 9, 2, 4],
       [5, 3, 3, 3, 2, 1],
       [5, 3, 3, 3, 2, 2],
       [5, 3, 3, 3, 2, 4],
       [5, 3, 3, 6, 2, 1],
       [5, 3, 3, 6, 2, 2],
       [5, 3, 3, 6, 2, 4],
       [5, 3, 3, 9, 2, 1],
       [5, 3, 3, 9, 2, 2],
       [5, 3, 3, 9, 2, 4]])

I'm interested in evaluating each list (appending the results) inside the array and filter if the results are not useful. I know this can be done by

#Defining 2 generic operations to provide an example

def Operation(args):
    args=args.tolist()
    a,b,c,*args = args
    return (a b c)/sum(args)

def Operation2(args):
    args=args.tolist()
    a,b,c,*args = args
    return (a/b/c)

#Using a list comprehension to calculate, check if the combination satisfies the requirement and append (if)

new_list = [np.append(element,[Operation(element),Operation2(element)]) for element in iter_func(5,[2,3],3,[3,6,9],2,[1,2,4]) if 0.7<Operation(element)<1.2 and 0.55<Operation2(element)<0.85]

#Printing the result
for i in new_list:
    print(i)

This results in:

[5.     2.     3.     3.     2.     4.     1.1111 0.8333]
[5.     2.     3.     6.     2.     1.     1.1111 0.8333]
[5.     2.     3.     6.     2.     2.     1.     0.8333]
[5.     2.     3.     6.     2.     4.     0.8333 0.8333]
[5.     2.     3.     9.     2.     1.     0.8333 0.8333]
[5.     2.     3.     9.     2.     2.     0.7692 0.8333]
[5.     3.     3.     6.     2.     2.     1.1    0.5556]
[5.     3.     3.     6.     2.     4.     0.9167 0.5556]
[5.     3.     3.     9.     2.     1.     0.9167 0.5556]
[5.     3.     3.     9.     2.     2.     0.8462 0.5556]
[5.     3.     3.     9.     2.     4.     0.7333 0.5556]

which works as a filter. The Question is: how could I do it directly evaluating the operations and conditions when a combination is generated?

This way I would be appending less elements and not iterating through every set once the combinations are calculated, which, I imagine, is better.

CodePudding user response:

Multiple types are cumbersome, especially if you store them in the same list. You could simply do:

data = ([5],[2,3],[3],[3,6,9],[2],[1,2,4])
plug_in = np.stack(np.meshgrid(*data), axis=-1).reshape(-1, len(data)) #Cartesian product

op1 = np.sum(plug_in[:,:3], axis=1)/np.sum(plug_in[:,3:], axis=1) #results of your operation1
op2 = plug_in[:,0]/plug_in[:,1]/plug_in[:,2] #results of your operation2

plug_in = np.column_stack([plug_in, op1[:,None], op2[:,None]]) #append two extra columns
plug_in = plug_in[(0.7 < op1) & (op1 < 1.2) & (0.55 < op2) & (op2 < 0.85)] #drop rows you don't need
with np.printoptions(precision=4): #print it nice!
    print(plug_in)

[[5.     2.     3.     3.     2.     4.     1.1111 0.8333]
 [5.     2.     3.     6.     2.     1.     1.1111 0.8333]
 [5.     2.     3.     6.     2.     2.     1.     0.8333]
 [5.     2.     3.     6.     2.     4.     0.8333 0.8333]
 [5.     2.     3.     9.     2.     1.     0.8333 0.8333]
 [5.     2.     3.     9.     2.     2.     0.7692 0.8333]
 [5.     3.     3.     6.     2.     2.     1.1    0.5556]
 [5.     3.     3.     6.     2.     4.     0.9167 0.5556]
 [5.     3.     3.     9.     2.     1.     0.9167 0.5556]
 [5.     3.     3.     9.     2.     2.     0.8462 0.5556]
 [5.     3.     3.     9.     2.     4.     0.7333 0.5556]]

CodePudding user response:

This is a non-numpy approach using the itertools.product to generate combinations, and filtering the elements as they are generated.

Changing your functions to work with a tuple (as opposed to array):

def Operation(args):
    a,b,c,*args = args
    return (a b c)/sum(args)

def Operation2(args):
    a,b,c,*args = args
    return (a/b/c)

and adding a helper so I don't have to evaluate the functions several times:

def foo(element):
    element=list(element)
    o1 = Operation(element)
    o2 = Operation2(element)
    if 0.7<o1<1.2 and 0.55<o2<0.85:
        element.extend([o1,o2])
        return element
    return None

Using itertools to generate the combinations. Since we'll be iterating on the combinations, there's no need to use one of the fancier numpy 'cartesian product' methods. We can stick with lists. Also, start with single element lists rather than scalars. Allowing for scalars is an unnecessary complication.

In [111]: gen = itertools.product([5],[2,3],[3],[3,6,9],[2],[1,2,4])
In [112]: list(gen)
Out[112]: 
[(5, 2, 3, 3, 2, 1),
 (5, 2, 3, 3, 2, 2),
 (5, 2, 3, 3, 2, 4),
 (5, 2, 3, 6, 2, 1),
 (5, 2, 3, 6, 2, 2),
 (5, 2, 3, 6, 2, 4),
 ....

Testing foo on a couple of entries:

In [113]: foo(Out[112][0])
In [114]: foo(Out[112][2])
Out[114]: [5, 2, 3, 3, 2, 4, 1.1111111111111112, 0.8333333333333334]

Reinitial the gen, and iterate with filtering. Can't use list comprehension without running foo twice per element. (And I don't have a new enough python version to use the walrus operator):

In [115]: gen = itertools.product([5],[2,3],[3],[3,6,9],[2],[1,2,4])
In [116]: res = []
     ...: for element in gen:
     ...:     x = foo(element)
     ...:     if x: res.append(x)
     ...: 
In [117]: res
Out[117]: 
[[5, 2, 3, 3, 2, 4, 1.1111111111111112, 0.8333333333333334],
 [5, 2, 3, 6, 2, 1, 1.1111111111111112, 0.8333333333333334],
 [5, 2, 3, 6, 2, 2, 1.0, 0.8333333333333334],
 [5, 2, 3, 6, 2, 4, 0.8333333333333334, 0.8333333333333334],
 [5, 2, 3, 9, 2, 1, 0.8333333333333334, 0.8333333333333334],
 [5, 2, 3, 9, 2, 2, 0.7692307692307693, 0.8333333333333334],
 [5, 3, 3, 6, 2, 2, 1.1, 0.5555555555555556],
 [5, 3, 3, 6, 2, 4, 0.9166666666666666, 0.5555555555555556],
 [5, 3, 3, 9, 2, 1, 0.9166666666666666, 0.5555555555555556],
 [5, 3, 3, 9, 2, 2, 0.8461538461538461, 0.5555555555555556],
 [5, 3, 3, 9, 2, 4, 0.7333333333333333, 0.5555555555555556]]

If needed that could be converted to array, float or object dtype.

  • Related