NumPy array shape mismatch on masking/assignment command-CodePudding

I'm trying to run a loop where I develop a mask, and then use that mask to assign various values in various rows in one array with specific values from another array. The following script works, but only when there are no duplicate values in column 0 of array y. If there are duplicates, then the mask would have an assignment made to multiple rows in y, then the error throws. Thx for any help.

x = np.zeros(shape=(100,10))
x[:,0] = np.arange(100)

# this seed = 9 produces duplicate values in column 1, which seems cause the problem
# (no issues when there are no duplicate values in column 1 of y)
y = (np.random.default_rng(9).random((10,7))*100).astype(int)

for i in range(x.shape[0]):
    mask = y[:,0] == x[i,0]
    y[mask,[1,3,4,6]] = x[i,[1,2,3,4]]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [219], in <cell line: 2>()
      2 for i in range(x.shape[0]):
      3     mask = y[:,0] == x[i,0]
----> 4     y[mask,[1,3,4,6]] = x[i,[1,2,3,4]]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (4,)

CodePudding user response：

The mask array in your example must have at least one True in each loop, because you are assigning to rows one by one in loops. You can use if condition to be sure mask contains at least one true:

1. First solution: curing the prepared loop

range_ = np.arange(y.shape[0], dtype=np.int64)
for i in range(x.shape[0]):
    mask = y[:, 0] == x[i, 0]
    if np.count_nonzero(mask) != 0:
        true_counts = np.count_nonzero(mask)
        broadcast_x = np.broadcast_to(x[i, [1, 2, 3, 4]], shape=(true_counts, 4))  # 4 is length of [1, 2, 3, 4]
        broadcast_y = np.broadcast_to([1, 3, 4, 6], shape=(true_counts, 4))
        y[range_[mask][:, None], broadcast_y] = broadcast_x

2. Second solution: vectorized way (the best)
Instead using loops, we can firstly find the intersection and then use advanced indexing as:

mask = np.in1d(y[:, 0], x[:, 0])
y[mask, np.array([1, 3, 4, 6])[:, None]] = 0

now for assigning an array instead of zero, for creating this array, we need to take the related values from x. For doing so, at first, we select the corresponding rows by x[y[:, 0] - x[0, 0]] (in your case it can be just x[y[:, 0] because np.arange start from 0 so x[0, 0] = 0) and then apply the masks to bring out the needed values from specified rows and columns:

mask = np.in1d(y[:, 0], x[:, 0])        # rows mask for y
new_arr = x[y[:, 0] - x[0, 0]][mask, np.array([1, 2, 3, 4])[:, None]]
y[mask, np.array([1, 3, 4, 6])[:, None]] = new_arr

3. Third solution: indexing (just for the prepared toy example)
For the prepared example in the question, you can do this easily by advanced indexing instead the loop:

y[:, [1, 3, 4, 6]] = 0

This last code is working on your prepared data because values in y (< 100) involved in x first column (which is from 0 to 99).
or in case of assigning array instead 0:

new_arr = np.array([3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
y[:, [1, 3, 4, 6]] = new_arr[:, None]