Populating a sorted matrix according to bin values from two vectors-CodePudding

I have two arrays: X and V, I want to assign a bin value for every entry in X and then create a matrix with the row size of the number of X entries, and column size as the number of bins. Then I want to map the values from V to the matrix according to their bin column.

I thought about starting with the following using np.digitize:

import numpy as np

#Sample arrays
X = np.array([1,3,5,4,5,7,2,4,5])
V = np.array([0.5,0.7,0.29,4.4,13.3,0.9,2.2,2.7,2.5])

#Creating the bin array
grid_max = X.max()
grid_min = X.min()
bin_width = int(3)
bins = np.arange(grid_min, grid_max  bin_width, bin_width)
bin_centres = (bins[1:]   bins[:-1]) / 2

#Bin number for each X entry
binplace_X = np.digitize(X, bins)

And then create a matrix of NaN:

Sort = np.full( (np.shape(X)[0], len(bins)) , np.nan)

And the Sort result I wish to get for this case should look like this:

   [[0.5, 0.29, 0.9],
    [0.7, 4.4, nan],
    [2.2, 13.3, nan],
    [nan, 2.7, nan],
    [nan. 2.5, nan].
    [nan, nan, nan],
    [nan, nan, nan],
    [nan, nan, nan],
    [nan, nan, nan]]

I'm not sure what would be an efficient way to accomplish this.

CodePudding user response：

If I understand the question correctly, you can try this:

import numpy as np

X = np.array([1, 3, 5, 4, 5, 7, 2, 4, 5])
V = np.array([0.5, 0.7, 0.29, 4.4, 13.3, 0.9, 2.2, 2.7, 2.5])

grid_max = X.max()
grid_min = X.min()
bin_width = 3
bins = np.arange(grid_min, grid_max   bin_width, bin_width)
binplace_X = np.digitize(X, bins)
Sort = np.full((np.shape(X)[0], len(bins)   1), np.nan)
Sort[np.arange(len(X)), binplace_X] = V

The Sort array then looks as follows:

[[  nan  0.5    nan   nan]
 [  nan  0.7    nan   nan]
 [  nan   nan  0.29   nan]
 [  nan   nan  4.4    nan]
 [  nan   nan 13.3    nan]
 [  nan   nan   nan  0.9 ]
 [  nan  2.2    nan   nan]
 [  nan   nan  2.7    nan]
 [  nan   nan  2.5    nan]]

Compared with your expected output, there is one more column (since the bins array creates len(bins) 1 bins, and not len(bins). Additionally, in the example you provided there are several non-nan entries in a row. This would indicate that a number belongs to a few different bins at the same time, which is impossible. The array above has only one non-nan entry in each row. On the other hand, if the goal is to move all non-nan values to the top of the columns while preserving their relative order, then you can do it as follows:

mask  = np.isnan(Sort.T)
Sort2 = np.full(Sort.shape, np.nan)
Sort2.T[~np.sort(mask, axis=1)] = Sort.T[~mask]

Sort2 is then as follows:

[[  nan  0.5   0.29  0.9 ]
 [  nan  0.7   4.4    nan]
 [  nan  2.2  13.3    nan]
 [  nan   nan  2.7    nan]
 [  nan   nan  2.5    nan]
 [  nan   nan   nan   nan]
 [  nan   nan   nan   nan]
 [  nan   nan   nan   nan]
 [  nan   nan   nan   nan]]