I have an algorithm that predicts values
when the labels are multi-label it returns a multi-D array
example
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TestSize, random_state=42)
>>> y_test
array([[1, 0, 0, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 0],
[0, 0, 1, ..., 1, 1, 0],
...,
[1, 0, 1, ..., 0, 1, 1],
[0, 1, 0, ..., 0, 1, 1],
[1, 0, 0, ..., 0, 0, 1]], dtype=uint8)
>>> y_test.shape
(100, 20)
mdl.fit(X_train, y_train)
y_hat = mdl.predict(X_test)
in this case the outcome is multi-D array
>>> y_hat
array([[0, 1, 1, ..., 0, 0, 0],
[0, 0, 1, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 1, 0],
...,
[0, 1, 1, ..., 0, 1, 1],
[0, 0, 0, ..., 0, 1, 1],
[0, 0, 1, ..., 0, 0, 1]], dtype=uint8)
>>> y_hat.shape
(100, 20)
This is good, and no issues here
but when I work with a single label
such as this example
example
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TestSize, random_state=42)
>>> y_test
array([[1],
[0],
[0],
...,
[1],
[0],
[1]], dtype=uint8)
>>> y_test.shape
(100, 1)
mdl.fit(X_train, y_train)
y_hat = mdl.predict(X_test)
in this case y_test is multi-D array with 1 column (100, 1)
but y_hat is a single dimension array (100,)
>>> y_hat
array([0, 1, 1, 0, 0, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 1,
...
0, 1, 1, 0, 0, 0, 1, 0], dtype=uint8)
>>> y_hat.shape
(100,)
How can I convert y_hat to a multi-D array with 1 column (100, 1) only when y_hat is not the same dimension as y_test
CodePudding user response:
To automatically convert to 2D only if needed, you can use numpy.atleast_2d
:
a = np.random.randint(0,2,100)
a.shape
# (100,)
a = np.atleast_2d(a.T).T
a.shape
# (100, 1)
CodePudding user response:
Commonly used methods include slicing and reshaping:
>>> ar
array([0, 1, 2])
>>> ar[:, np.newaxis] # ar[:, None]
array([[0],
[1],
[2]])
>>> ar.reshape(-1, 1) # ar.reshape(*ar.shape, 1)
array([[0],
[1],
[2]])
However, this will only create a new array view and will not modify the original array. If you want to modify the original array, you can directly assign a value to the shape
property of array:
>>> ar
array([0, 1, 2])
>>> ar.shape = -1, 1
>>> ar
array([[0],
[1],
[2]])
Where, -1 means that the size of this dimension is calculated by numpy itself. It is usually obtained by dividing the length of the array by the size of other dimensions.