Home > database >  Implementation of Hellinger distance with numpy only
Implementation of Hellinger distance with numpy only

Time:01-17

I got this task to implement a python function using NumPy. The function should compute the Hellinger distance between two matrices P and Q with dimensions (n, k). p_i is the vector of row i of P and p_i,j is the value of row i in column j of P.

The Hellinger distance for matrices is defined as followed:

h_i = i/sqrt(2) * sqrt(sum(j=1,k) of (sqrt(p_i,j)-sqrt(q_i,j))^2)

H is a vector of length n and h_i is the value i of H, with i = 1,...,n. So the Hellinger distance between two matrices is equivalent to the Hellinger distance between the rows of the matrices. For each row, the distance is stored in the output vector H.

The task now is to implement the function (using NumPy), which will compute the above-described problem. It gets handed over two 2D-NumPy-Arrays P and Q, and it should return a 1D-Numpy-Array H of the right length.

I never worked with NumPy before, so I would be very grateful for any suggestions.

I informed myself a little bit on the NumPy-Docs but I would love to get any suggentions.

CodePudding user response:

I found out that you need to use the axis argument in certain NumPy functions (e.g. np.sum()) in order to tell NumPy if it should iterate over the rows or columns of an array. I did exactly that: return np.sqrt(1/2) * np.sqrt( np.sum((np.sqrt(P) - np.sqrt(Q))**2,axis=1) ) and it works.

The only problem is that it still gives back negative values. How is that possible, since the subtraction is taken to the power of 2?

  • Related