I'm working with a (15000, 2) array in Numpy from which I plan to build an adjacency matrix. Each row represents a vertex connection from node to i to j, or in other words, the first element of each row has an edge with the second element. For example, [24, 79] represents an edge between node 24 and 79.
If there exists a row that was [79, 24], I would like to remove it altogether because [24, 79] already exists.
Is there a way to remove these "duplicate" connections so that the overall array only consists of uni-directional vertices? I'm doing this step before I make symmetrize the matrix, where I add the matrix to its transpose.
CodePudding user response:
You can do that by sorting the items in each row so to easily track duplicate edges (ie. duplicate rows). The later can be done using np.unique
. Here is an example:
v = np.random.randint(0, 1_000, size=(15000, 2)) # len: 15000
result = np.unique(np.sort(v, axis=1), axis=0) # len: 14790
result
contains the set of unique non-directed edges where, for each row, the smallest ID is the first item. The computation is done efficiently in O(n log n)
time.
Note the parameter return_index
and return_inverse
of np.unique
can be used to track the unsorted row source index. Also note that using a (2, 15_000)
array is likely to be faster due to the operation being more SIMD-friendly and cache-friendly.