Having a matrix with d features and n samples, I would like to compare each feature of a sample (row) against the mean of the column corresponding to that feature and then assign a corresponding label 1 or 0. Eg. for a matrix X = [x11, x12; x21, x22] I compute the mean of the two columns (mu1, mu2) and then I keep on comparing (x11, x21 with mu1 and so on) to check whether these are greater or smaller than mu and to then assign a label to them according to the if statement (see below).
I have the mean vector for each column i.e. of length d. I am now using for-loops however these are not computationally effective.
X_copy = X_train;
mu = np.mean(X_train, axis = 0)
for i in range(X_train.shape[0]):
for j in range(X_train.shape[1]):
if X_train[i,j]<mu[j]: #less than mean for the col, assign 0
X_copy[i,j] = 0
else:
X_copy[i,j] = 1 #more than or equal to mu for the col, assign 1
Is there any better alternative? I don't have much experience with python hence thank you for understanding.
CodePudding user response:
Direct comparison, which makes the average vector compare on each row of the original array. Then convert the data type of the result to int:
>>> X_train = np.random.rand(3, 4)
>>> X_train
array([[0.4789953 , 0.84095907, 0.53538172, 0.04880835],
[0.64554335, 0.50904539, 0.34069036, 0.5290601 ],
[0.84664389, 0.63984867, 0.66111495, 0.89803495]])
>>> (X_train >= X_train.mean(0)).astype(int)
array([[0, 1, 1, 0],
[0, 0, 0, 1],
[1, 0, 1, 1]])
Update:
There is a broadcast mechanism for operations between numpy arrays. For example, an array is compared with a number, which will make the number swim among all elements of the array and compare them one by one:
>>> X_train > 0.5
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
>>> X_train > np.full(X_train.shape, 0.5) # Equivalent effect.
array([[False, True, True, False],
[ True, True, False, True],
[ True, True, True, True]])
Similarly, you can compare a vector with a 2D array, as long as the length of the vector is the same as that of the first dimension of the array:
>>> mu = X_train.mean(0)
>>> X_train > mu
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
>>> X_train > np.tile(mu, (X_train.shape[0], 1)) # Equivalent effect.
array([[False, True, True, False],
[False, False, False, True],
[ True, False, True, True]])
How do I compare other axes? My English is not good, so it is difficult for me to explain. Here I provide the official explanation of numpy. I hope you can get started through it: Broadcasting