A 2d numpy array has the property that each column holds a distinct range of integers that do not overlap (see b below). An 8, for example, can only appear in the 3rd column.
b = [[1, 5, 9, 11, 13],
[1, 6, 8, 10, 14],
[2, 4, 8, 12, 15],
[2, 5, 7, 11, 13],
[3, 4, 9, 10, 15],
[3, 5, 7, 12, 14]]
I need to know which pairs of rows have EXACTLY 0 matching elements, 1 matching element, ..., 5 matching elements. In a 'perfect world', the output would look something like the array below. Rows 0 and 0 have 5 common elements, rows 0 and 1 have 1 common element, etc.
out = [[0, 0, 5],
[0, 1, 1],
[0, 2, 0],
[0, 3, 3],
...etc
My main problem is how to deal with the 'EXACTLY'.
CodePudding user response:
I can't think of anything better than a list comprehension. You could possibly speed this up by setting assume_unique=True
if each row does not contain duplicates.
import numpy as np
b = [[1, 5, 9, 11, 13],
[1, 6, 8, 10, 14],
[2, 4, 8, 12, 15],
[2, 5, 7, 11, 13],
[3, 4, 9, 10, 15],
[3, 5, 7, 12, 14]]
out = np.array([
[i, j, np.intersect1d(b[i], b[j]).size]
for i in range(N)
for j in range(N)
])
which should give your expected output, though I find this more intuitive:
out = np.array([
[
np.intersect1d(rowx, rowy).size
for rowx in b
]
for rowy in b
])
which gives:
array([[5, 1, 0, 3, 1, 1],
[1, 5, 1, 0, 1, 1],
[0, 1, 5, 1, 2, 1],
[3, 0, 1, 5, 0, 2],
[1, 1, 2, 0, 5, 1],
[1, 1, 1, 2, 1, 5]])