Home > Blockchain >  Compare matrix values columnwise with the corresponding mean
Compare matrix values columnwise with the corresponding mean

Time:06-07

Having a matrix with d features and n samples, I would like to compare each feature of a sample (row) against the mean of the column corresponding to that feature and then assign a corresponding label 1 or 0. Eg. for a matrix X = [x11, x12; x21, x22] I compute the mean of the two columns (mu1, mu2) and then I keep on comparing (x11, x21 with mu1 and so on) to check whether these are greater or smaller than mu and to then assign a label to them according to the if statement (see below).

I have the mean vector for each column i.e. of length d. I am now using for-loops however these are not computationally effective.

X_copy = X_train; 

mu = np.mean(X_train, axis = 0)
for i in range(X_train.shape[0]):
   for j in range(X_train.shape[1]):
        if X_train[i,j]<mu[j]: #less than mean for the col, assign 0
           X_copy[i,j] = 0
        else:
           X_copy[i,j] = 1 #more than or equal to mu for the col, assign 1

Is there any better alternative? I don't have much experience with python hence thank you for understanding.

CodePudding user response:

Direct comparison, which makes the average vector compare on each row of the original array. Then convert the data type of the result to int:

>>> X_train = np.random.rand(3, 4)
>>> X_train
array([[0.4789953 , 0.84095907, 0.53538172, 0.04880835],
       [0.64554335, 0.50904539, 0.34069036, 0.5290601 ],
       [0.84664389, 0.63984867, 0.66111495, 0.89803495]])
>>> (X_train >= X_train.mean(0)).astype(int)
array([[0, 1, 1, 0],
       [0, 0, 0, 1],
       [1, 0, 1, 1]])

Update:

There is a broadcast mechanism for operations between numpy arrays. For example, an array is compared with a number, which will make the number swim among all elements of the array and compare them one by one:

>>> X_train > 0.5
array([[False,  True,  True, False],
       [ True,  True, False,  True],
       [ True,  True,  True,  True]])
>>> X_train > np.full(X_train.shape, 0.5)    # Equivalent effect.
array([[False,  True,  True, False],
       [ True,  True, False,  True],
       [ True,  True,  True,  True]])

Similarly, you can compare a vector with a 2D array, as long as the length of the vector is the same as that of the first dimension of the array:

>>> mu = X_train.mean(0)
>>> X_train > mu
array([[False,  True,  True, False],
       [False, False, False,  True],
       [ True, False,  True,  True]])
>>> X_train > np.tile(mu, (X_train.shape[0], 1))    # Equivalent effect.
array([[False,  True,  True, False],
       [False, False, False,  True],
       [ True, False,  True,  True]])

How do I compare other axes? My English is not good, so it is difficult for me to explain. Here I provide the official explanation of numpy. I hope you can get started through it: Broadcasting

  • Related