Identifying rows in a numpy array that have a given number of matching elements-CodePudding

A 2d numpy array has the property that each column holds a distinct range of integers that do not overlap (see b below). An 8, for example, can only appear in the 3rd column.

b = [[1, 5, 9, 11, 13],
     [1, 6, 8, 10, 14],
     [2, 4, 8, 12, 15],
     [2, 5, 7, 11, 13],
     [3, 4, 9, 10, 15],
     [3, 5, 7, 12, 14]]

I need to know which pairs of rows have EXACTLY 0 matching elements, 1 matching element, ..., 5 matching elements. In a 'perfect world', the output would look something like the array below. Rows 0 and 0 have 5 common elements, rows 0 and 1 have 1 common element, etc.

out = [[0, 0, 5],
       [0, 1, 1],
       [0, 2, 0],
       [0, 3, 3],
       ...etc

My main problem is how to deal with the 'EXACTLY'.

CodePudding user response：

I can't think of anything better than a list comprehension. You could possibly speed this up by setting assume_unique=True if each row does not contain duplicates.

import numpy as np

b = [[1, 5, 9, 11, 13],
     [1, 6, 8, 10, 14],
     [2, 4, 8, 12, 15],
     [2, 5, 7, 11, 13],
     [3, 4, 9, 10, 15],
     [3, 5, 7, 12, 14]]

out = np.array([
    [i, j, np.intersect1d(b[i], b[j]).size]
    for i in range(N)
    for j in range(N)
])

which should give your expected output, though I find this more intuitive:

out = np.array([
    [
        np.intersect1d(rowx, rowy).size
        for rowx in b
    ]
    for rowy in b
])

which gives:

array([[5, 1, 0, 3, 1, 1],
       [1, 5, 1, 0, 1, 1],
       [0, 1, 5, 1, 2, 1],
       [3, 0, 1, 5, 0, 2],
       [1, 1, 2, 0, 5, 1],
       [1, 1, 1, 2, 1, 5]])