Compare matrix values columnwise with the corresponding mean-CodePudding

Having a matrix with d features and n samples, I would like to compare each feature of a sample (row) against the mean of the column corresponding to that feature and then assign a corresponding label 1 or 0. Eg. for a matrix X = [x11, x12; x21, x22] I compute the mean of the two columns (mu1, mu2) and then I keep on comparing (x11, x21 with mu1 and so on) to check whether these are greater or smaller than mu and to then assign a label to them according to the if statement (see below).

I have the mean vector for each column i.e. of length d. I am now using for-loops however these are not computationally effective.

X_copy = X_train; 

mu = np.mean(X_train, axis = 0)
for i in range(X_train.shape[0]):
   for j in range(X_train.shape[1]):
        if X_train[i,j]<mu[j]: #less than mean for the col, assign 0
           X_copy[i,j] = 0
        else:
           X_copy[i,j] = 1 #more than or equal to mu for the col, assign 1

Is there any better alternative? I don't have much experience with python hence thank you for understanding.

CodePudding user response：

Direct comparison, which makes the average vector compare on each row of the original array. Then convert the data type of the result to int:

>>> X_train = np.random.rand(3, 4)
>>> X_train
array([[0.4789953 , 0.84095907, 0.53538172, 0.04880835],
       [0.64554335, 0.50904539, 0.34069036, 0.5290601 ],
       [0.84664389, 0.63984867, 0.66111495, 0.89803495]])
>>> (X_train >= X_train.mean(0)).astype(int)
array([[0, 1, 1, 0],
       [0, 0, 0, 1],
       [1, 0, 1, 1]])

Update:

There is a broadcast mechanism for operations between numpy arrays. For example, an array is compared with a number, which will make the number swim among all elements of the array and compare them one by one:

>>> X_train > 0.5
array([[False,  True,  True, False],
       [ True,  True, False,  True],
       [ True,  True,  True,  True]])
>>> X_train > np.full(X_train.shape, 0.5)    # Equivalent effect.
array([[False,  True,  True, False],
       [ True,  True, False,  True],
       [ True,  True,  True,  True]])

Similarly, you can compare a vector with a 2D array, as long as the length of the vector is the same as that of the first dimension of the array:

>>> mu = X_train.mean(0)
>>> X_train > mu
array([[False,  True,  True, False],
       [False, False, False,  True],
       [ True, False,  True,  True]])
>>> X_train > np.tile(mu, (X_train.shape[0], 1))    # Equivalent effect.
array([[False,  True,  True, False],
       [False, False, False,  True],
       [ True, False,  True,  True]])

How do I compare other axes? My English is not good, so it is difficult for me to explain. Here I provide the official explanation of numpy. I hope you can get started through it: Broadcasting