For an array, say, a = np.array([1,2,1,0,0,1,1,2,2,2])
, something like an adjacency "matrix" A
needs to be created. I.e. A
is a symmetric (n, n)
numpy array where n = len(a)
and A[i,j] = 1
if a[i] == a[j]
and 0
otherwise (i = 0...n-1
and j = 0...n-1
):
0 1 2 3 4 5 6 7 8 9
0 1 0 1 0 0 1 1 0 0 0
1 1 0 0 0 0 0 1 1 1
2 1 0 0 1 1 0 0 0
3 1 1 0 0 0 0 0
4 1 0 0 0 0 0
5 1 1 0 0 0
6 1 0 0 0
7 1 1 1
8 1 1
9 1
The trivial solution is
n = len(a)
A = np.zeros([n, n]).astype(int)
for i in range(n):
for j in range(n):
if a[i] == a[j]:
A[i, j] = 1
else:
A[i, j] = 0
Can this be done in a numpy
way, i.e. without loops?
CodePudding user response:
You can use numpy broadcasting:
b = (a[:,None]==a).astype(int)
df = pd.DataFrame(b)
output:
0 1 2 3 4 5 6 7 8 9
0 1 0 1 0 0 1 1 0 0 0
1 0 1 0 0 0 0 0 1 1 1
2 1 0 1 0 0 1 1 0 0 0
3 0 0 0 1 1 0 0 0 0 0
4 0 0 0 1 1 0 0 0 0 0
5 1 0 1 0 0 1 1 0 0 0
6 1 0 1 0 0 1 1 0 0 0
7 0 1 0 0 0 0 0 1 1 1
8 0 1 0 0 0 0 0 1 1 1
9 0 1 0 0 0 0 0 1 1 1
If you want the upper triangle only, use numpy.tril_indices
:
b = (a[:,None]==a).astype(float)
b[np.tril_indices_from(b, k=-1)] = np.nan
df = pd.DataFrame(b)
output:
0 1 2 3 4 5 6 7 8 9
0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
1 NaN 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0
2 NaN NaN 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
3 NaN NaN NaN 1.0 1.0 0.0 0.0 0.0 0.0 0.0
4 NaN NaN NaN NaN 1.0 0.0 0.0 0.0 0.0 0.0
5 NaN NaN NaN NaN NaN 1.0 1.0 0.0 0.0 0.0
6 NaN NaN NaN NaN NaN NaN 1.0 0.0 0.0 0.0
7 NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0
8 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0
9 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0