Contingency matrix to 1D format in Python-CodePudding

2x2 contingency matrix:

Translates to:

[[ 0 0 0 1 ]
 [ 0 0 1 0 ]]

The contingency matrix represents the outcome of two clustering algorithms, each with two clusters. The first row indicates that Ci has three data points in, say, cluster 1 and one data point in, say, cluster 2. Cj has three data points in, say, cluster A and 1 data point in, say, cluster B. Therefore, both algorithms "agree" on two out of N = 4 data points.

Since there does not exist an adjusted mutual information function that takes in the contingency matrix as input, I would like to transform the contingency matrix to 1d inputs for the sklearn implementation of AMI.

Is there an efficient way to re-write a NxN contingency matrix in 1D vector form in Python code?

It would look something like:

V1
V2
For i row index 
  For j column index
     Append as many as contingency_ij elements with value i to V1 and with value j to V2

CodePudding user response：

Well, this solves the problem as you have stated it. The final matrix v can be converted to numpy. v would need as many empty elements as there are dimensions in c.

c = [[2,1],[1,0]]
v = [[],[]]

for i,row in enumerate(c):
    for j,val in enumerate(row):
        v[0].extend( [i]*val )
        v[1].extend( [j]*val )

print(v)

CodePudding user response：

A numpy implementation could take advantage of numpy.repeat:

# input contingency matrix
a = np.array([[2,1],[1,0]])
# fixed "cluster id" matrix
b = np.array([[0,1],[0,1]])
out = np.vstack([np.repeat(b.ravel('F'), a.ravel()),
                 np.repeat(b.ravel(), a.ravel())
                 ])

Output:

array([[0, 0, 0, 1],
       [0, 0, 1, 0]])

Other example with [[5,4],[0,3]] as input:

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]])

You can also use cluster ids other that 0/1, if wanted (example with a = np.array([[5,4],[0,3]]) ; b = np.array([[0,1],[2,3]])):

array([[0, 0, 0, 0, 0, 2, 2, 2, 2, 3, 3, 3],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]])