I have a matrix A of size NXN with float values and another boolean matrix B of size NXN
For every row, I need to find the mean of all values in A belonging to indices where True is the corresponding value for that index in matrix B
Similarly, I need to find the mean of all values in A belonging to indices where False is the corresponding value for that index in matrix B
Finally, I need to find the count of number of rows where "True" mean is lesser than "False" mean
For example :
A = [[1.0, 2.0, 3.0]
[4.0, 5.0, 6.0]
[7.0, 8.0, 9.0]]
B = [[True, True, False]
[False, False, True]
[True, False, True]]
Initially, count = 0
For row 1, true_mean = 1.0 2.0 / 2 = 1.5 and false_mean = 3.0
true_mean < false_mean, so count = 0 1=1
For row 2, true_mean = 6.0 and false_mean = 4.0 5.0 / 2 = 4.5
true_mean > false_mean, so count remains same
For row 3, true_mean = 7.0 9.0 / 2 = 8.0 and false_mean = 8.0
true_mean == false_mean, so count remains same
Final count value = 1
My attempt:-
true_mat = np.where(B, A, 0)
false_mat = np.where(B, 0, A)
true_mean = true_mat.mean(axis=1)
false_mean = false_mat.mean(axis=1)
But this actually gives wrong answer since denominator is not exactly the count of number of True/False values in that row but instead 'N'
I only need the count, I don't need the true_mean and false_mean
Anyway to fix it?
CodePudding user response:
The mean issue can be resolved by computing a mask
:
mask_norm = tf.reduce_sum(tf.clip_by_value(true_mat, 0., 1.),axis=0)
true_mean = tf.math.divide(tf.reduce_sum(true_mat, axis=1), mask_norm)
#true_mean : [1.5, 6. , 8. ]
You can find the count using tf.reduce_sum(tf.where(true_mean < false_mean, 1, 0))
CodePudding user response:
You could also try something like this:
import tensorflow as tf
A = tf.constant([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]])
B = tf.constant([[True, True, False],
[False, False, True],
[True, False, True]])
t_rows = tf.where(B)
f_rows = tf.where(~B)
_true = tf.gather_nd(A, t_rows)
_false = tf.gather_nd(A, f_rows)
count = tf.reduce_sum(tf.cast(tf.math.greater(tf.math.segment_mean(_false, f_rows[:, 0]), tf.math.segment_mean(_true, t_rows[:, 0])), dtype=tf.int32))
tf.print(count)
1
Works also with rows that are all True
or False
:
B = tf.constant([[True, True, True],
[False, False, True],
[True, False, True]])
# 0
B = tf.constant([[False, False, False],
[False, False, False],
[True, False, True]])
# 2
CodePudding user response:
I would say your start is good
true_mat = np.where(B, A, 0)
false_mat = np.where(B, 0, A)
But we want to divide by the number of Trues or Falses, respectively, so...
true_sum = np.sum(B, axis = 1) #sum of Trues per row
false_sum = N-true_sum # if you don't have N given, do N=A.shape[0]
true_mean = np.sum(true_mat, axis = 1)/true_sum #add up rows of true_mat and divide by true_sum
false_mean = np.sum(false_mat, axis = 1)/false_sum
For your example this gives
[1.5 6. 8. ]
[3. 4.5 8. ]
So now we just have to compare where the second is larger than the first:
count = np.sum(np.where(false_mean > true_mean, 1, 0))