How to debug numpy masks-CodePudding

This question is related to this one.

I have a function that I'm trying to vectorize. This is the original function:

def aspect_good(angle: float, planet1_good: bool, planet2_good: bool):
    """
    Decides if the angle represents a good aspect.
    NOTE: returns None if the angle doesn't represent an aspect.
    """

    if 112 <= angle <= 128 or 52 <= angle <= 68:
        return True
    elif 174 <= angle <= 186 or 84 <= angle <= 96:
        return False
    elif 0 <= angle <= 8 and planet1_good and planet2_good:
        return True
    elif 0 <= angle <= 6:
        return False
    else:
        return None

and this is what I have so far:

def aspects_good(
    angles: npt.ArrayLike,
    planets1_good: npt.ArrayLike,
    planets2_good: npt.ArrayLike,
) -> npt.NDArray:
    """
    Decides if the angles represent good aspects.

    Note: this function was contributed by Mad Physicist. Thank you.
    https://stackoverflow.com/q/73672739/11004423

    :returns: an array with values as follows:
        1 – the angle is a good aspect
        0 – the angle is a bad aspect
       -1 – the angle doesn't represent an aspect
    """
    result = np.full_like(angles, -1, dtype=np.int8)

    bad_mask = np.abs(angles % 90) <= 6
    result[bad_mask] = 0

    good_mask = (np.abs(angles - 120) <= 8) |\
                (np.abs(angles - 60) <= 8) |\
                ((np.abs(angles - 4) <= 4) & planets1_good & planets2_good)
    result[good_mask] = 1

    return result

It's not working as expected, however, I wrote a test with pytest:

def test_aspects_good():
    tests = np.array([
        [120, True, False, True],
        [60, True, False, True],
        [180, True, False, False],
        [90, True, False, False],

        [129, True, False, -1],
        [111, True, False, -1],
        [69, True, False, -1],
        [51, True, False, -1],
        [187, True, False, -1],
        [173, True, False, -1],
        [97, True, False, -1],
        [83, True, False, -1],

        [0, True, True, True],
        [0, True, False, False],
        [0, False, True, False],
        [0, False, False, False],

        [7, False, False, -1],
        [7, True, True, True],
        [9, True, True, -1],
    ])

    angles = tests[:, 0]
    planets1_good = tests[:, 1]
    planets2_good = tests[:, 2]
    expected = tests[:, 3]

    result = aspects_good(angles, planets1_good, planets2_good)
    assert np.array_equal(result, expected)

and it fails, saying False, the arrays are different.

Here I have result and expected arrays combined side by side:

array([[ 1,  1],
│      [ 1,  1],
│      [ 0,  0],
│      [ 0,  0],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [ 0,  1],
│      [ 0,  0],
│      [ 0,  0],
│      [ 0,  0],
│      [-1, -1],
│      [-1,  1],
│      [-1, -1]])

Note: the first column is result array, and the second one is expected. As you can see, they differ in two places. Now the question comes "How to debug this?" Normally I would use a debugger, and step through each if/elif/else condition. But I have no idea how to debug numpy masks.

CodePudding user response：

The issue appears to be a combination of three things:

Numpy uses a homogeneous type throughout an array.

You will find that tests.dtype is dtype('int64') or dtype('int32') depending on your architecture. This means that the columns containing planet1_good and planet2_good are integers too, not booleans.
Bitwise AND (&) is not a logical operator.

A bitwise AND operation will return a result with the largest of the input types. Specifically for the result of <=, which is a boolean, and an int array, the result will be an int. That means that you can do something like np.array([1, 2]) & np.array([True, True]) to get array([1, 0]), not array([True, False]).
Numpy distinguishes between a boolean mask and a fancy index by the dtype, even if the fancy index contains only zeros and ones. If you have a 2 element array, x, then x[[True, True]] = 1 assigns 1 to both elements of x. However, x[[1, 1]] = 1 assigns 1 only to the second element of x.

So that's basically what's happening here. bad_mask is a boolean mask, and works exactly as you would expect. However, good_mask ANDs with a couple of integer arrays, so becomes an integer array containing zeros and ones. The expression result[good_mask] = 1 is actually assigning the first and second element of result to be 1, which fortuitously correspond to two of your tests. The remaining True results can not and will not be assigned 1.

There are a few ways to fix this, listed in decreasing order of preference (my favorite on top):

Convert all your arrays to numpy arrays of the correct type. Right now your function does not meet the contract that it accepts any array-like. If you pass in a list for angles, you will get TypeError: unsupported operand type(s) for %: 'list' and 'int'. This is a fairly idiomatic approach:

angles = np.asanyarray(angles)
planets1_good = np.asanyarray(planets1_good, dtype=bool)
planets2_good = np.asanyarray(planets2_good, dtype=bool)

result = np.full_like(angles, -1, dtype=np.int8)

bad_mask = np.abs(angles % 90) <= 6
result[bad_mask] = 0

good_mask = (np.abs(angles - 120) <= 8) |\
            (np.abs(angles - 60) <= 8) |\
            ((np.abs(angles - 4) <= 4) & planets1_good & planets2_good)
result[good_mask] = 1
return result

Ensure that good_mask is actually a mask before applying it. You should still convert angles, but the other arrays will be converted automatically by the & operator:

good_mask = ((np.abs(angles - 120) <= 8) |\
             (np.abs(angles - 60) <= 8) |\
             ((np.abs(angles - 4) <= 4) & planets1_good & planets2_good)).astype(bool)

You may alternatively do something similar to what you did with bad_mask:

good_mask = (np.abs(angles % 60) <= 8) & (angles >= -8) & (angles <= 128)

Convert the mask to an index, which won't care about the original dtype:
```
result[np.flatnonzero(good_mask)] = 1
```