Home > Back-end >  Computing the mean of an array considering only some indices
Computing the mean of an array considering only some indices

Time:12-01

I have two 2d arrays, one containing float values, one containing bool. I want to create an array containing the mean values of the first matrix for each column considering only the values corresponding to False in the second matrix.

For example:

A = [[1 3 5]
     [2 4 6]
     [3 1 0]]

B = [[True False False]
     [False False False]
     [True True False]]

result = [2, 3.5, 3.67] 

CodePudding user response:

Where B is False, keep the value of A, make it NaN otherwise and then use the nanmean function which ignores NaN's for operations.

np.nanmean(np.where(~B, A, np.nan), axis=0)

>>> array([2.        , 3.5       , 3.66666667])

CodePudding user response:

Using numpy.mean using where argument to specify elements to include in the mean.

np.mean(A, where = ~B, axis = 0)
>>> [2.         3.5        3.66666667]

CodePudding user response:

A = [[1, 3, 5],
     [2, 4, 6],
     [3, 1, 0]]

B = [[True, False, False],
     [False, False, False],
     [True, True, False]]

sums = [0]*len(A[0])
amounts = [0]*len(A[0])
for i in range(0, len(A)):
  for j in range(0, len(A[0])):
    sums[j] = sums[j]   (A[i][j] if not B[i][j] else 0)
    amounts[j] = amounts[j]   (1 if not B[i][j] else 0)


result = [sums[i]/amounts[i] for i in range(0, len(sums))]

print(result)

CodePudding user response:

There may be some fancy numpy trick for this, but I think using a list comprehension to construct a new array is the most straightforward.

result = np.array([a_col[~b_col].mean() for a_col, b_col in zip(A.T,B.T)])

To follow better, this is what the line does expanded out:

result=[]
for i in range(len(A)):
    new_col = A[:,i][~B[:,i]]
    result.append(new_col.mean())

CodePudding user response:

You could also use a masked array:

import numpy as np

result = np.ma.array(A, mask=B).mean(axis=0).filled(fill_value=0)
# Output:
# array([2.        , 3.5       , 3.66666667])

which has the advantage of being able to supply a fill_value for when every element in some column in B is True.

  • Related