Home > Back-end >  Numpy - how find unique values from a symetric similarity Matrix
Numpy - how find unique values from a symetric similarity Matrix

Time:04-11

I have a squared symetric matrix like this:

[[-1.         -0.70710678 -0.70710678 -0.70710678 -0.        ]
 [-0.70710678 -1.         -1.         -1.         -0.        ]
 [-0.70710678 -1.         -1.         -1.         -0.        ]
 [-0.70710678 -1.         -1.         -1.         -0.        ]
 [-0.         -0.         -0.         -0.         -1.        ]]

I would like to analyze all numbers below or above the diagonal, but no the diagonal.

How can I do to find unique values from this matrix except the values in the diagonal ?

Expected output : [-0., -0.70710678]

CodePudding user response:

You can get the values of the diagonal using arr.diagonal() and np.unique and remove them from the values of the array

unique = np.unique(arr)
index = np.ravel([np.where(unique == i) for i in np.unique(arr.diagonal())])
values = np.delete(unique, index)
print(values) # [-0.70710678 -0.        ]

CodePudding user response:

If a is the name of the numpy array with the representation you provided, then

print(np.array(np.setdiff1d(a, a.diagonal())))

does the trick with output

[-0.70710678  0.        ]

(Original Answer) Alternatively,

import numpy as np

b = np.unique(a[~np.eye(a.shape[0],dtype=bool)].reshape(a.shape[0],-1))
print(b)
print(np.setdiff1d(b, a.diagonal()))

Printing b will output the unique values in the array a with the main diagonal elements deleted. The next line removes those numbers in the diagonal of a that are in b.

The output is

[-1.         -0.70710678  0.        ]
[-0.70710678  0.        ]

CodePudding user response:

You can use python sets, assuming a the input:

b = np.array(list(set(a.flatten())-set(np.diagonal(a))))

output: array([-0.70710678, -0. ])

NB. this is faster for small arrays (the provided 25 items example) and roughly as fast as numpy operations for larger arrays (tested on 1M (1000x1000) and 100M (10k x 10k) items with 1000 unique possibilities)

timing:

perfplot timing

code for the perfplot:

import numpy as np
import perfplot

def guy(a):
    unique = np.unique(a)
    index = np.ravel([np.where(unique == i) for i in np.unique(a.diagonal())])
    values = np.delete(unique, index)
    return values

def mozway(a):
    return np.array(list(set(a.flatten())-set(np.diagonal(a))))

def oda(a):
    b = np.unique(a[~np.eye(a.shape[0],dtype=bool)].reshape(a.shape[0],-1))
    return np.setdiff1d(b, a.diagonal())

def oda_setdiff(a):
    return np.array(np.setdiff1d(a, a.diagonal()))

perfplot.show(
    setup=lambda n: np.random.randint(0,1000, size=(n,n)),  # or setup=np.random.rand
    kernels=[guy, oda, oda_setdiff, mozway],
    n_range=[2**k for k in range(11)],
    xlabel="array shape in each dimension",
    equality_check=None,  # set to None to disable "correctness" assertion)
  • Related