I'm doing an "Algorithms and Data Structures" course right now and have stuck on this problem. I was hoping to find some ideas here.
Here is the problem:
You are given 2 numbers n
and s
which stand for the number of points and an area of some rectangle. Then there come n
lines each consisting of 2 numbers: x_i
and y_i
which stand for x and y coordinates of i-th point. There may be multiple points at the same coordinates. If so happens those points are considered different and must be processed accordingly.
The input constraints are the following: n <= 300, s <= 1,000,000, each coordinate is between 0 and 2000 inclusive.
The solution must run under 8 seconds and consume less than 512 Mb of memory
The problem itself is to find some rectangle with sides parallel to coordinate axes, both sides are greater or equal to 1 (the length of the side can be a floating point number) and the area is exactly s
such that as many points as possible are inside this rectangle. A point is considered to be inside a rectangle if lies inside or on the side of a rectangle.
As an answer to the problem I need to print coordinates of the points which lie within the found rectangle.
So, here my thoughts on this problem:
1) Bruteforce
First of all it took me some time to understand that the rectangle may actually be at any coordinates as long as its area is equal to s
. Also I found that the most "effective" (in terms of amount of points inside it) rectangle is the one that has points on its border, because there is no point in trying to have them fully inside it.
So, the first solution I tried was a simple bruteforce. However, the complexity of the solution is O(2^n)
which (who would have thought!?) got me a time limit error in the checking system.
2) Sort
I started thinking of a faster solution, probably, like O(n*logn)
. As soon as I started thinking about n*logn
I tried sorting. So the idea was the following:
For each point sort the rest of the points closest to furthest. Then for each point I would iterate over them and add to my rectangle as long as its area is less then or equal to s
.
However that idea fails in some cases where there are multiple points at the same coordinates and sorting by distance doesn't take that into account.
3) Knapsack
Once I determined the problem with sorting the problem started to look like a knapsack problem. We can aggregate the points by their coordinates, so the points with the same ones will be grouped together. For each group we need to remember how many points there are in the group. That will be the value of the item for the knapsack. It gets harder with the weight of the item. I think of a knapsack as a rectangle, so if I implement it using a classical solution with a DP matrix then at every cell I can store a rectangle. Having that the weight of each group of points can be the area it will add to the rectangle if we pick it.
However, this solution also seems incorrect to me. Here is why:
- We need to pack a knapsack for every point. Because if we start at some points we might reach not to all points. So we need to check all possible knapsacks and take the "heaviest". However, the complexity of a single knapsack algorithm is
O(n*s)
and if we do it for every point it will getO(n*n*s) = O(n^2*s)
. With given constraints it is unacceptable - A single knapsack DP matrix will in the worst case consist of
n*s = 3*10^8
elements. In each cell I want to store a rectangle which I can represent as 4 numbers: upper, lower, left and right bounds. If each bound is uint8 so the rectangle itself will take 4 bytes. The total memory will be3*10^8*4 = 12*10^8 bytes = 1200 Mb
which exceeds the memory limit.
Now it seems to me that the knapsack solution is also incorrect. However, I have been thinking on the problem for over 3 days now and cannot find another solution to it, even theoretical
CodePudding user response:
You can do this pretty easily in O(n3) time, which will certainly be less than 8 seconds given the constraint that n <= 300.
First note that there will always be an optimal rectangle with points on its top, bottom, and left borders. Given any optimal rectangle, if it doesn't have a point on its top or bottom border, then its height can be reduced until it does, without dropping any points. If it doesn't have a point on its left border, then it can be moved right until it does, again without dropping any points.
So, the simple solution is:
- Sort the points by x coordinate
- For each pair of points p1, p2, with p1.y <= p2.y-1:
- Find the width of rectangles with p1 and p2 on the top and bottom borders: w = s/(p2.y - p1.y). Adjust the height to 1 if it's too small.
- Using the sorted point list, for each point with y between p1.y and p2.y, find the number of points you can include if that point is on the left edge. Remember the best result.
Step 2.3 can be be done in O(n) time using a standard 2-pointer algorithm. Since each point you consider is to the right of the previous one, the last point that fits will also be to the right of the previous last point. Both positions increase monotonically through the list.
In python, it looks like this:
def pointsInBestRect(points, area):
points = sorted(points)
bestcount = 0
bestminy = 0
bestmaxy = 0
bestminx = 0
for i in range(len(points)):
p1 = points[i]
for j in range(i, len(points)):
p2 = points[j]
miny = min(p1[1],p2[1])
maxy = max(p1[1],p2[1])
maxy = max(maxy, miny 1)
h = maxy-miny
if h < 1 or h > area:
continue
outpos = 0
count = 0
for pleft in points:
while outpos < len(points) and (points[outpos][0] - pleft[0])*h <= area:
outy = points[outpos][1]
outpos = 1
if outy >= miny and outy <= maxy:
count = 1
if count > bestcount:
bestcount = count
bestminx = pleft[0]
bestminy = miny
bestmaxy = maxy
if pleft[1] >= miny and pleft[1] <= maxy:
count -= 1
besth = bestmaxy - bestminy
return [p for p in points if
p[1] >= bestminy and
p[1] <= bestmaxy and
p[0] >= bestminx and
(p[0]-bestminx) * besth <= area]
print(pointsInBestRect([(1,1),(10,10),(4,5),(5,4)], 1))