I got this task to implement a python function using NumPy.
The function should compute the Hellinger distance between two matrices P
and Q
with dimensions (n, k)
. p_i
is the vector of row i
of P
and p_i,j
is the value of row i
in column j
of P
.
The Hellinger distance for matrices is defined as followed:
h_i = i/sqrt(2) * sqrt(sum(j=1,k) of (sqrt(p_i,j)-sqrt(q_i,j))^2)
H
is a vector of length n
and h_i
is the value i
of H
, with i = 1,...,n
. So the Hellinger distance between two matrices is equivalent to the Hellinger distance between the rows of the matrices. For each row, the distance is stored in the output vector H
.
The task now is to implement the function (using NumPy), which will compute the above-described problem. It gets handed over two 2D-NumPy-Arrays P
and Q
, and it should return a 1D-Numpy-Array H
of the right length.
I never worked with NumPy before, so I would be very grateful for any suggestions.
I informed myself a little bit on the NumPy-Docs but I would love to get any suggentions.
CodePudding user response:
I found out that you need to use the axis
argument in certain NumPy functions (e.g. np.sum()
) in order to tell NumPy if it should iterate over the rows or columns of an array. I did exactly that: return np.sqrt(1/2) * np.sqrt( np.sum((np.sqrt(P) - np.sqrt(Q))**2,axis=1) )
and it works.
The only problem is that it still gives back negative values. How is that possible, since the subtraction is taken to the power of 2?