Fastest way to compare lists and not recompare-CodePudding

I have a list of coordinates that form an axis-aligned 2D box. I want to see how many boxes intersect each other without repeating it. For example, I have boxes A, B, and C. If A intersects with B, B intersecting with A will still count as 1 intersection instead of 2 separate ones.

The way I'm thinking is to grab one box compare it to all the rest and then throw it out since the comparison is done and I want no repeats. For example going back to A, B and C. If I'm on A and it intersects B and not C I have made its round and hence will not need to keep it in the loop. Hence once I'm on B it will not recheck A.

I can't think of a faster method. This is akin to linear search and I think the worst-case complexity will be O(n^2). I can t use binary as it is possible that there are multiple intersections. I've been also looking at the sorting algorithm but I need one that won't go back and check the older values again.

Thank you in advance for your help.

CodePudding user response：

You can somewhat reduce the average complexity, more precisely the computation time, but you'll always get a worst case in n²... Because, inherently, that's a O(n²) algorithm, since you have to test two by two every N boxes.

One way to reduce that computation time is to "attach", to each box, its circumscribed sphere. It's quite simple to compute: its center is the barycenter of the 8 edges of the rectangle, and its radius is any distance from this barycenter to one edge.

The trick is: given two boxes, if their circumscribed spheres aren't intersecting (distance between the two centers is greater than the sum of the two radiuses), then the boxes can not intersect. If the two spheres intersect, then the boxes may intersect.

This trick doesn't change at all the complexity, but it reduce A LOT the computation time, since it's way faster to test for spheres than for boxes (especially if they aren't parallel to axes...). And it will reduce the list size, up to empty it for any potential intersections. Also, you also have only a shortened (even empty) list of potential intersections for each box.

You can gain some computation time by using a list of boxes pairs to finely test, using squares of distances for comparisons, etc.: it's not so much, but in the end, it still saves some time. Then, you can compute the real boxes intersections with way less boxes pairs.

This would be a significative boost with a lot of 3D random boxes unaligned with axes, and it won't be with few 2D rectangles aligned with axes. In both cases, you'll always be with a O(n²) complexity (i.e. the worst case).

CodePudding user response：

You can solve this in O(n log n). Since you're trying to solve a static intersection problem in two dimensions, a common trick is to transform the problem into a dynamic intersection problem in one dimension (i.e., one space dimension becomes the time dimension).

Suppose you have a closed rectangle with lower left corner (x1, y1) and upper right corner (x2, y2). This rectangle becomes two "interval events":

Interval [y1, y2] inserted at time x1
Interval [y1, y2] removed after time x2

Transform all your rectangles into events in this way, and then sort by time (breaking ties with insertions coming before removals).

Now, you need a data structure that lets you add and remove intervals [A, B], and also query the number of intervals in the data structure intersecting [A, B]. Then you process the "interval events" in the sorted order, but keep a running sum of how many current intervals intersect [A, B] before inserting each interval [A, B].

One data structure to do this in O(log n) time per operation is with two balanced binary search trees: one holding beginning-points of intervals, and the other holding ending-points of intervals. You'll also need to augment each BST to be an Order Statistic Tree, to quickly query the number of points less than or equal to a certain value that are in the tree.

Then, finding how many intervals currently in your data structure intersect an arbitrary interval [A, B] is simple; that count is:

#(Intervals intersecting [A, B]) =

  #(values in the beginning-points tree that are <= B) 
- #(values in the ending-points tree that are < A)

which you can check is correct from the definition of two intervals intersecting: neither interval starts after the other one has ended.

You could also replace the order statistic tree with a data structure for prefix-sums, like a Fenwick tree, which requires much less code for the same complexity.