I want to write a function for centering an input data matrix by multiplying it with the centering matrix. The function shall subtract the row-wise mean from the input.
My code:
import numpy as np
def centering(data):
n = data.shape()[0]
centeringMatrix = np.identity(n) - 1/n * (np.ones(n) @ np.ones(n).T)
data = centeringMatrix @ data
data = np.array([[1,2,3], [3,4,5]])
center_with_matrix(data)
But I get a wrong result matrix, it is not centered.
Thanks!
CodePudding user response:
The centering matrix is
np.eye(n) - np.ones((n, n)) / n
Here is a list of issues in your original formulation:
np.ones(n).T
is the same asnp.ones(n)
. The transpose of a 1D array is a no-op in numpy. If you want to turn a row vector into a column vector, add the dimension explicitly:np.ones((n, 1))
OR
np.ones(n)[:, None]
The normal definition is to subtract the column-wise mean, not the row-wise, so you will have to transpose and right-multiply the input to get row-wise operation:
n = data.shape()[1] ... data = (centeringMatrix @ data.T).T
Your function creates a new array for the output but does not currently return anything. You can either return the result, or perform the assignment in-place:
return (centeringMatrix @ data.T).T
OR
data[:] = (centeringMatrix @ data.T).T
OR
np.matmul(centeringMatrix, data.T, out=data.T)