Home > Back-end >  Conditional broadcasting assignment of dumpy matrix with array of index
Conditional broadcasting assignment of dumpy matrix with array of index

Time:06-26

I have a numpy matrix filled with some values (I'm using zeros and just two topples to make the example easy to present the example with the two conditions):

nparray = array([[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., (2, 2.5), 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., (1, 6.5), 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
                 [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.]])

I have a sub matrix that is the result of some calculations that needs to be assigned in some specific index location in nparray:

sub_array= array([[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
                  [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
                  [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
                  [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], dtype=object)
index = [1, 2, 5, 9]

I need to assign the values in sub_array to nparray in the position of the result of index values combinations only if the value in nparray is not a tupple or if the value of the second item of the tupple in sub_array is lowest that the second value of the tupple in nparray at the same position , resulting in something like (I'm adding the index at the top to make the assignment location clear):

--------Index-----  0 | 1       | 2        | 3 | 4 | 5        | 6 | 7 | 8 | 9        

nparray =   array([[0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
     |       1     [0., (2, 2.5), (2, 3.2) , 0., 0., (3, 4.6) , 0., 0., 0., (4, 3.4)],
     |       2     [0., (3, 4.5), (4, 0.4) , 0., 0., (5, 3.2) , 0., 0., 0., (6, 2.3)],
     i       3     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
     n       4     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
     d       5     [0., (3, 4.5), (5, 2.3) , 0., 0., (7, 5.3) , 0., 0., 0., (9, 2.3)],
     e       6     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
     x       7     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
     |       8     [0., 0.      , 0.       , 0., 0., 0.       , 0., 0., 0., 0.      ],
     |       9     [0.,(12, 3.2), (45, 2.4), 0., 0., (32, 2.3), 0., 0., 0., (6, 5.4)]])

As you can see the sub_array is assigned in the locations of all the index array combinations.

For the tupple in position (1,1) the value is not replaced because the second item value in nparray (2.5) is lowest than the second item value in sub_array (3.2), in the other hand the tupple in position (2,1) is replaced because the second item value in nparray (6.5) is highest than the second item value in sub_array (4.5)

How can I achieve this conditional assignment with NumPy to ensure also time efficiency and not to go through a loop?

Pd: My main objetive is to calculate a distance matrix based on some prior filtering, my dataset has 110K and it will take half a year to complete the calculations If I run it trough the entire set and not for a subset of it. Thanks in advance!

CodePudding user response:

EDIT: just noticed your additional constraint on which tuple to take in the case there is already a tuple at the given index. I'm headed out the door but perhaps this is enough for OP to go from here.

I believe this gets you what you want, via some pretty straightforward indexing:

In [34]: arr = np.array([[0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., (2, 2.5), 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., (1, 6.5), 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.],
    ...:                  [0., 0.      , 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=object)

In [35]: arr
Out[35]:
array([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, (2, 2.5), 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, (1, 6.5), 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], dtype=object)

In [36]: idx = np.ix_([1, 2, 5, 9], [1, 2, 5, 9])

In [37]: sub = np.asarray([[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
    ...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
    ...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
    ...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], 'float,float')

In [38]: arr[idx] = np.where(arr[idx], arr[idx], sub)

In [39]: arr
Out[39]:
array([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, (2, 2.5), (2.0, 3.2), 0.0, 0.0, (3.0, 4.6), 0.0, 0.0, 0.0,
        (4.0, 3.4)],
       [0.0, (1, 6.5), (4.0, 0.4), 0.0, 0.0, (5.0, 3.2), 0.0, 0.0, 0.0,
        (6.0, 2.3)],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, (3.0, 4.5), (5.0, 2.3), 0.0, 0.0, (7.0, 5.3), 0.0, 0.0, 0.0,
        (9.0, 2.3)],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, (12.0, 3.2), (45.0, 2.4), 0.0, 0.0, (32.0, 2.3), 0.0, 0.0,
        0.0, (6.0, 5.4)]], dtype=object)

However, I've gotta ask -- why?! Why are you storing your data like this? This completely defeats the purpose of numpy...

CodePudding user response:

Here's the basic indexed assignment:

In [60]: index = np.array([1,2,6,7]); data = np.arange(16).reshape(4,4)
In [62]: res = np.zeros((10,10),int)

We can select a (4,4) block of values:

In [63]: res[index[:,None],index]
Out[63]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

and assign the (4,4) data to it as well:

In [64]: res[index[:,None],index] = data

In [65]: res
Out[65]: 
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  1,  0,  0,  0,  2,  3,  0,  0],
       [ 0,  4,  5,  0,  0,  0,  6,  7,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  8,  9,  0,  0,  0, 10, 11,  0,  0],
       [ 0, 12, 13,  0,  0,  0, 14, 15,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])

What exactly is your sub_array? If I just copy-n-paste, I get a (4,4,2) array:

In [67]: sub_array= np.array([[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
    ...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
    ...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
    ...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], dtype=object)

In [68]: sub_array.shape
Out[68]: (4, 4, 2)

I can't assign that to res

I can make a (4,4) array with tuple elements via:

In [69]: sub_array= np.empty((4,4),object) 
    ...: sub_array[:] = [[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
    ...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
    ...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
    ...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]]

In [70]: sub_array
Out[70]: 
array([[(1, 3.2), (2, 3.2), (3, 4.6), (4, 3.4)],
       [(3, 4.5), (4, 0.4), (5, 3.2), (6, 2.3)],
       [(3, 4.5), (5, 2.3), (7, 5.3), (9, 2.3)],
       [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]], dtype=object)

And assign the values to another object dtype array:

In [71]: res = np.zeros((10,10),object)
In [73]: res[index[:,None],index] = sub_array

In [74]: res
Out[74]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, (1, 3.2), (2, 3.2), 0, 0, 0, (3, 4.6), (4, 3.4), 0, 0],
       [0, (3, 4.5), (4, 0.4), 0, 0, 0, (5, 3.2), (6, 2.3), 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, (3, 4.5), (5, 2.3), 0, 0, 0, (7, 5.3), (9, 2.3), 0, 0],
       [0, (12, 3.2), (45, 2.4), 0, 0, 0, (32, 2.3), (6, 5.4), 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=object)

In fact I could start with the nested list of tuples, and skip sub_array:

In [75]: res = np.empty((10,10),object)

In [76]: alist = [[(1, 3.2) ,  (2, 3.2),  (3, 4.6), (4, 3.4)],
    ...:                   [(3, 4.5) ,  (4, 0.4),  (5, 3.2), (6, 2.3)],
    ...:                   [(3, 4.5) ,  (5, 2.3),  (7, 5.3), (9, 2.3)],
    ...:                   [(12, 3.2), (45, 2.4), (32, 2.3), (6, 5.4)]]

In [77]: res[index[:,None],index] = alist

In [78]: res
Out[78]: 
array([[None, None, None, None, None, None, None, None, None, None],
       [None, (1, 3.2), (2, 3.2), None, None, None, (3, 4.6), (4, 3.4),
        None, None],
       [None, (3, 4.5), (4, 0.4), None, None, None, (5, 3.2), (6, 2.3),
        None, None],
       [None, None, None, None, None, None, None, None, None, None],
       [None, None, None, None, None, None, None, None, None, None],
       [None, None, None, None, None, None, None, None, None, None],
       [None, (3, 4.5), (5, 2.3), None, None, None, (7, 5.3), (9, 2.3),
        None, None],
       [None, (12, 3.2), (45, 2.4), None, None, None, (32, 2.3),
        (6, 5.4), None, None],
       [None, None, None, None, None, None, None, None, None, None],
       [None, None, None, None, None, None, None, None, None, None]],
      dtype=object)

Another option is to start with a (10,10,2) numeric res, and copy the (4,4,2) data to it.

  • Related