Home > Blockchain >  How to optimize expanding sequences from 2D numpy array?
How to optimize expanding sequences from 2D numpy array?

Time:02-20

I have a 2D view and I want to expand it so that it contains all sequences, ie 1-2, 1-2-3, 1-2-3-4, 1-2-3-4-5 representing its column values.

From those only include the sequences where terminating number is allowed in the mask. For example, if the mask has a row [0,0,1,0,1] we only want sequences 1-2-3 and 1-2-3-4-5 for that row.

import numpy as np

view = np.array([               #arbitrary decreasing sequences
                [15,14,13,12,11],
                [10,9,8,7,6],   
                [9,8,7,6,5],
                [7,6,5,4,3],
                [6,5,4,3,2],
                ])

mask = np.array([               #arbitrary mask
                [0,0,1,0,1],
                [0,0,1,1,0],
                [0,1,0,0,0],
                [0,0,1,1,1],
                [0,0,0,0,1],
                ])

kernel= np.array([              #(lambda x: np.put(x,0,0) or x)(np.tril(np.ones((5,5))))
                [0,0,0,0,0],
                [1,1,0,0,0],
                [1,1,1,0,0],
                [1,1,1,1,0],
                [1,1,1,1,1],
                ])

mask_3d = mask[:,:,None] * kernel[None,:,:]
view_3d = view[:,None,:] * mask_3d

view_2d = view_3d.reshape((-1,view_3d.shape[2]))
view_2d = view_2d[np.flatnonzero(np.max(view_2d,axis=1))]
view_2d = np.ma.masked_array(view_2d,view_2d==0)

The result should be like this:

print (view_2d)
"""
[[15 14 13 -- --]
 [15 14 13 12 11]
 [10 9 8 -- --]
 [10 9 8 7 --]
 [9 8 -- -- --]
 [7 6 5 -- --]
 [7 6 5 4 --]
 [7 6 5 4 3]
 [6 5 4 3 2]]
"""

The way I solved it is by expanding the view into 3D, then combining my mask with a kernel that would basically basically hide the remaining column values, ie hide 4-5 for sequence 1-2-3. After applying the mask in 3D, I reshaped into 2D and then filter zero rows.

The problem with my solution is it's poorly optimized. The expansion into 3D is particularly expensive. Can this problem be solved differently? Please let me know, perhaps it would take less resources to compute.

CodePudding user response:

You can shorten your code a bit by exploiting the information in the mask array. It's only ~1.21x faster than the solution in your answer with the small example arrays (run on a 2-core colab instance). I have not benchmarked larger cases.

np.ma.masked_array(
    view.repeat(mask.sum(1), 0),
    np.triu(np.ones(view.shape[1], dtype='bool'), 1)[mask.nonzero()[1]]
)

Output(clipped)

masked_array(
  data=[[15, 14, 13, --, --],
        [15, 14, 13, 12, 11],
        [10, 9, 8, --, --],
        [10, 9, 8, 7, --],
        [9, 8, --, --, --],
        [7, 6, 5, --, --],
        [7, 6, 5, 4, --],
        [7, 6, 5, 4, 3],
        [6, 5, 4, 3, 2]],
   ...

CodePudding user response:

Ok I've broken it down into two operations where I pre-filter when all mask rows are zeros, and only then multiply the values. Also using boolean operation has helped.

mask = mask.astype(bool)
width =  view.shape[1]
kernel = np.tril(np.full((width,width),True))    
kernel[0,0] = False

view_2d =  np.repeat(view[:,None,:],width,axis=1)  .reshape((-1,width))
mask_2d =  (mask[:,:,None] & kernel[None,:,:])            .reshape((-1,width))
row_filter = mask_2d.max(axis=1)
mask_2d, view_2d = mask_2d[row_filter], view_2d[row_filter]
view_2d *= mask_2d

view_2d = np.ma.masked_array(view_2d,view_2d==0)
  • Related