Numpy filter matrix based on column-CodePudding

I have a matrix with several different values for each row:

arr1 = np.array([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18],[19,20,21,22,23,24,25,26,27]])
arr2 = np.array([["A"],["B"],["C"]])

This produces the following matrices:

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18],
       [19, 20, 21, 22, 23, 24, 25, 26, 27]])

array([['A'],
       ['B'],
       ['C']])

A represents the first 3 columns, B represents the next 3 columns, and C represents the last 3 columns. So the result I'd like here is:

array([[1,2,3],
       [13,14,15],
       [25,26,27]])

I was thinking about converting arr2 to a mask array, but I'm not even sure how to do this. If it was a 1darray I could do something like this:

arr[0,1,2]

but for a 2darray I'm not even sure how to mask like this. I tried this and got errors:

arr[[0,1,2],[3,4,5],[6,7,8]]

What's the best way to do this?

Thanks.

CodePudding user response：

You could use string.ascii_uppercase to index the index in the alphabet. And reshape arr1 by 3 chunks:

from string import ascii_uppercase
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(ascii_uppercase.index)(arr2).ravel()]

Or just directly map A to 0 and so on...

reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(['A', 'B', 'C'].index)(arr2).ravel()]

Both Output:

array([[ 1,  2,  3],
       [13, 14, 15],
       [25, 26, 27]])

CodePudding user response：

If you gonna have shape of arr1 fixed as shown above (3,9) then it can be done with single line of code as below:

arr2  = np.array([arr1[0][0:3],arr1[1][3:6],arr1[2][6:9]])

The output will be as follows:

[[ 1  2  3]
 [13 14 15]
 [25 26 27]]

CodePudding user response：

you can use 'advanced indexing' which index the target array by coordinate arrays.

rows = np.array([[0,0,0],[1,1,1],[2,2,2]])
cols = np.array([[0,1,2],[3,4,5],[6,7,8]])

arr1[rows, cols]
>>> array([[ 1,  2,  3],
          [13, 14, 15],
          [25, 26, 27]])

and you can make some functions like

def diagonal(arr, step):
    rows = np.array([[x]*step for x in range(step)])
    cols = np.array([[y for y in range(x, x step)] for x in range(0, step**2, step)])
    return arr[rows, cols]

diagonal(arr1, 3)
>>> array([[ 1,  2,  3],
          [13, 14, 15],
          [25, 26, 27]])

reference: https://numpy.org/devdocs/user/basics.indexing.html