Home > other >  Numpy: Calculate means of array with specific elements
Numpy: Calculate means of array with specific elements

Time:04-24

I'm implementing the Nearest Centroid Classification algorithm and I'm kind of blocked on how to use numpy.mean in my case.

So suppose I have some spherical datasets X:

[[ 0.39151059  3.48203037]
 [-0.68677876  1.45377717]
 [ 2.30803493  4.19341503]
 [ 0.50395297  2.87076658]
 [ 0.06677012  3.23265678]
 [-0.24135103  3.78044279]
 [-0.05660036  2.37695381]
 [ 0.74210998 -3.2654815 ]
 [ 0.05815341 -2.41905942]
 [ 0.72126958 -1.71081388]
 [ 1.03581142 -4.09666955]
 [ 0.23209714 -1.86675298]
 [-0.49136284 -1.55736028]
 [ 0.00654881 -2.22505305]]]

and the labeled vector Y:

[0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]

An example with 100 2D data points gives the following result: enter image description here

The NCC algorithm consists of first calculating the class mean of each class (0 and 1: that's blue and red) and then calculating the nearest class centroid for the next data point.

This is my current function:

def mean_ncc(X,Y):
    # find unique classes
    m_cids = np.unique(Y)   #[0. 1.]
    # compute class means
    mu = np.zeros((len(cids), X.shape[1]))    #[[0. 0.] [0. 0.]] (in the case where Y has 2 unique points (0 and 1)
    for class_idx, class_label in enumerate(cids):
        mu[class_idx, :] = #problem here
    return mu

So here I want an array containing the class means of '0' (blue) points and '1' (red) points: How can I specify the number of elements of X whose mean I want to calculate? I would like to do something like this:

for class_idx, class_label in enumerate(m_cids):
       mu[class_idx, :] = np.mean(X[only the elements,that contains the same class_label], axis=0)
   

Is it possible or is there another way to implement this?

CodePudding user response:

You could use something like this:

import numpy as np

tags = [0, 0, 1, 1, 0, 1]
values = [5, 4, 2, 5, 9, 8]

tags_np = np.array(tags)
values_np = np.array(values)

print(values_np[tags_np == 1].mean())

EDIT: You will surely need to look more into the axis parameter for the mean function:

import numpy as np

values = [[5, 4],
            [5, 4],
            [4, 3],
            [4, 3]]

values_np = np.array(values)
tags_np = np.array([0, 0, 1, 1])

print(values_np[tags_np == 0].mean(axis=0))
  • Related