np.where with arbitrary number of conditions-CodePudding

Problem

This question: Numpy where function multiple conditions asks how to use np.where with two conditions. This answer suggests to use the & operator between conditions, which works if we have a low number of conditions which can be typed. This answer suggests using the np.logical_and, which can take only two arguments.

This thread: Numpy "where" with multiple conditions also discusses multiple conditions for np.where, but the number of conditions are known in advance.

I am looking for a way to evaluate an np.where expression without knowing the number of conditions in advance.

Reproducible setup

I have a 2D array:

arr = \
np.array([[1,2,3,4],
          [4,5,6,7],
          [9,8,7,6],
          [0,1,0,1],
          [9,7,6,5]])

Select the rows which have, for example, index 1 element larger than 5, index 2 element larger than 3. To do that, I do:

res = arr[np.where((arr[:,1]>5) & (arr[:,2]>4))]

res is then:

array([[9, 8, 7, 6],
       [9, 7, 6, 5]])

as expected.

But what if I have these conditions as lists? The above example would be:

cols = [1,2] # arbitrary length list
tholds = [5,4] # arbitrary length list

These two lists are unknown length in advance, but they have the same length.

How can I get res using the cols and tholds lists?

What I have tried

Use ast.literal_eval to define:

filterstring = "&".join([f"(pdist[:,{col}]>{th})" for col, th in zip(cols,tholds)])

which evaluates to (pdist[:,1]>5)&(pdist[:,2]>4), ie what we had above within np.where() when the conditions are typed out manually.

However, ast.literal_eval(f"np.where({filterstring})") gives an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-269-1aaff20de82f> in <module>()
----> 1 ast.literal_eval(f"np.where({filterstring})")

3 frames
/usr/lib/python3.7/ast.py in _convert_num(node)
     53         elif isinstance(node, Num):
     54             return node.n
---> 55         raise ValueError('malformed node or string: '   repr(node))
     56     def _convert_signed_num(node):
     57         if isinstance(node, UnaryOp) and isinstance(node.op, (UAdd, USub)):

ValueError: malformed node or string: <_ast.Call object at 0x7f41daa21f10>

So this did not work. This answer to the question ast.literal_eval() malformed node or string while converting a string with list of array()s confirms that this is not the right approach.

EDIT:

The suggestion to use np.wheres in a succession works fine for this particular example, but is not really what I look for. I would want to call np.where once, not multiple times evaluating one condition only.

CodePudding user response：

Tried to avoid eval. It has some security implications.

You could to it iteratively, like so

def unknown_conditions(arr, cols, tholds):

    for col, thold in zip(cols, tholds):
        arr = arr[np.where(arr[:, col] > thold)]
    
    return arr

CodePudding user response：

You could accumulate the number of met conditions and then call the np.where function once. From there it would be very easy to mix and/or combination from the conditions.

(Conceptually very similar to academy's suggestion.)

def filter_by_conditions(arr, cols, tholds):
    n_conditions = len(cols)
    bool_accumulator = np.zeros(arr.shape[0])
    for c, t in zip(cols, tholds):
        bool_accumulator  = (arr[:, c] > t).astype(int)

    return arr[np.where(bool_accumulator) == n_conditions]