Home > Software engineering >  How to remove "duplicate" edges in 2-D array in Numpy?
How to remove "duplicate" edges in 2-D array in Numpy?

Time:01-30

I'm working with a (15000, 2) array in Numpy from which I plan to build an adjacency matrix. Each row represents a vertex connection from node to i to j, or in other words, the first element of each row has an edge with the second element. For example, [24, 79] represents an edge between node 24 and 79.

If there exists a row that was [79, 24], I would like to remove it altogether because [24, 79] already exists.

Is there a way to remove these "duplicate" connections so that the overall array only consists of uni-directional vertices? I'm doing this step before I make symmetrize the matrix, where I add the matrix to its transpose.

CodePudding user response:

You can do that by sorting the items in each row so to easily track duplicate edges (ie. duplicate rows). The later can be done using np.unique. Here is an example:

v = np.random.randint(0, 1_000, size=(15000, 2))  # len: 15000
result = np.unique(np.sort(v, axis=1), axis=0)    # len: 14790

result contains the set of unique non-directed edges where, for each row, the smallest ID is the first item. The computation is done efficiently in O(n log n) time.

Note the parameter return_index and return_inverse of np.unique can be used to track the unsorted row source index. Also note that using a (2, 15_000) array is likely to be faster due to the operation being more SIMD-friendly and cache-friendly.

  • Related