From the following NumPy array:
[5, 2, 4, 6, 3]
I'd like to get to the following matrix:
[
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0]
]
Using Pandas get_dummies
appears very simple:
pd.get_dummies(original_array).values
But it has one drawback, in that missing indices are not represented as columns (e.g. 0, 1 in this example) in the final matrix.
If we assume that the exact names/indices of the desired "columns" are known in advance (here, all integers from 0 to 6 included), what would be the most efficient way to get to the matrix shown above, starting from the initial array?
CodePudding user response:
You can create a zeros matrix and then use advanced indexing to assign one to correct columns:
a = [5, 2, 4, 6, 3]
ohe = np.zeros((len(a), max(a) 1), dtype=int)
ohe[np.arange(len(a)), a] = 1
ohe
array([[0, 0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0]])
CodePudding user response:
Advanced indexing is your answer! Assuming you know your desired final shape (here, (5, 7)
):
In [5]: desired_shape = (5, 7)
In [6]: z = np.zeros(desired_shape, dtype="uint8")
In [5]: z
Out[5]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
In [6]: idxs = [5, 2, 4, 6, 3]
In [7]: z[range(len(z)), idxs] = 1
In [8]: z
Out[8]:
array([[0, 0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0]], dtype=uint8)