I have two arrays: X
and V
, I want to assign a bin value for every entry in X
and then create a matrix with the row size of the number of X
entries, and column size as the number of bins. Then I want to map the values from V
to the matrix according to their bin column.
I thought about starting with the following using np.digitize
:
import numpy as np
#Sample arrays
X = np.array([1,3,5,4,5,7,2,4,5])
V = np.array([0.5,0.7,0.29,4.4,13.3,0.9,2.2,2.7,2.5])
#Creating the bin array
grid_max = X.max()
grid_min = X.min()
bin_width = int(3)
bins = np.arange(grid_min, grid_max bin_width, bin_width)
bin_centres = (bins[1:] bins[:-1]) / 2
#Bin number for each X entry
binplace_X = np.digitize(X, bins)
And then create a matrix of NaN
:
Sort = np.full( (np.shape(X)[0], len(bins)) , np.nan)
And the Sort
result I wish to get for this case should look like this:
[[0.5, 0.29, 0.9],
[0.7, 4.4, nan],
[2.2, 13.3, nan],
[nan, 2.7, nan],
[nan. 2.5, nan].
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]
I'm not sure what would be an efficient way to accomplish this.
CodePudding user response:
If I understand the question correctly, you can try this:
import numpy as np
X = np.array([1, 3, 5, 4, 5, 7, 2, 4, 5])
V = np.array([0.5, 0.7, 0.29, 4.4, 13.3, 0.9, 2.2, 2.7, 2.5])
grid_max = X.max()
grid_min = X.min()
bin_width = 3
bins = np.arange(grid_min, grid_max bin_width, bin_width)
binplace_X = np.digitize(X, bins)
Sort = np.full((np.shape(X)[0], len(bins) 1), np.nan)
Sort[np.arange(len(X)), binplace_X] = V
The Sort
array then looks as follows:
[[ nan 0.5 nan nan]
[ nan 0.7 nan nan]
[ nan nan 0.29 nan]
[ nan nan 4.4 nan]
[ nan nan 13.3 nan]
[ nan nan nan 0.9 ]
[ nan 2.2 nan nan]
[ nan nan 2.7 nan]
[ nan nan 2.5 nan]]
Compared with your expected output, there is one more column (since the bins
array creates len(bins) 1
bins, and not len(bins)
. Additionally, in the example you provided there are several non-nan
entries in a row. This would indicate that a number belongs to a few different bins at the same time, which is impossible. The array above has only one non-nan
entry in each row. On the other hand, if the goal is to move all non-nan
values to the top of the columns while preserving their relative order, then you can do it as follows:
mask = np.isnan(Sort.T)
Sort2 = np.full(Sort.shape, np.nan)
Sort2.T[~np.sort(mask, axis=1)] = Sort.T[~mask]
Sort2
is then as follows:
[[ nan 0.5 0.29 0.9 ]
[ nan 0.7 4.4 nan]
[ nan 2.2 13.3 nan]
[ nan nan 2.7 nan]
[ nan nan 2.5 nan]
[ nan nan nan nan]
[ nan nan nan nan]
[ nan nan nan nan]
[ nan nan nan nan]]