Home > Software engineering >  How to optimize the sorting of a vector with coordinates based on distance?
How to optimize the sorting of a vector with coordinates based on distance?

Time:10-31

I need to sort a vector of coordinates (x, y >= 1) in a way that every next point from the vector is the closest one to the previous by calculating the distance with the formula from getDistance().

My current solution is too slow as I need the program to be able to finish in 5 seconds or less with vector length (N) equal to 100 000.

struct Point {
    int ind;
    int x;
    int y;
    double dist;
};



double getDist(int x1, int y1, int x2, int y2) {
  return sqrt((x1 - x2) * (x1 - x2)   (y1 - y2) * (y1 - y2));
}



vector<Point> cordSort(vector<Point> vect) {
    
    vector<Point> finalDistVect;
    finalDistVect.push_back(vect[0]);
    Point firstPoint = vect[0];
    vect.erase(vect.begin());

    for (i = 0; i < pcVect.size() - 1; i  ) {

        sort(vect.begin(), vect.end(), [firstPoint](const Point & a, const Point & b) {
            return getDist(firstPoint.x, firstPoint.y, a.x, a.y) < getDist(firstPoint.x, firstPoint.y, b.x, b.y);
        });

        finalDistVect.push_back(vect[0]);
        finalDistVect[i].dist = getDist(firstPoint.x, firstPoint.y, vect[0].x, vect[0].y);
        firstPoint = vect[0];
        vect.erase(vect.begin());
    }

    return finalDistVect;
}

vect is the initial vector with coordinates sorted by:

    sort(vect.begin(), vect.end(), [](const Point & a, const Point & b) {
        if (a.x   a.y != b.x   b.y) {
            return a.x   a.y < b.x   b.y;
        }
        return a.x < b.x;
    });

I am thinking about implementing bucket sort but I don't know if it will work for my problem.

CodePudding user response:

your implementation indeed increase inefficiency by repeatedly erase the first element from a vector. std::vector is not designed to be used to frequently erase elements from other the back.

Not sure if I read your algorithm correctly. The first element is predetermined as the first element of the input, then your program repeatedly find the points that has shortest distance from the last element (point) in the output point vector.

If that's the case, it has no benefit to sort at all.

A naive algorithm is like bubbling:


1. add first point to outputVector, add all rest to openSet;
2. if openSet is empty, we are done.
3. take the last point from output Vector, check with all points in `openSet`, to find the one with shortest distance from it, add it to outputVector, remove it from openSet;
4. goto 2

Basically I am recommending you do use a std::set to keep track of openset, or maybe even better, std::unordered_set.

Another way is to do it in place, just swap the chosen points with the one who is taking its place.

e.g. we have P0, P1, P2, P3, P4 as input in a vector

1. int pos = 1; // P0 is in right place, we are looking for 
                // the point that shall go to index 1;
2. check all points from index `pos` to `4`(which is the max index) and find the one with shortest distance from `P0`, let's say we get `P4`;
3. swap `P0` (the one at index `pos`) and `P4` (the chosen one);
4.    pos;
5. if pos!=4(max index), goto 2.

We are using pos to keep track of sorted and open and do it in place.

CodePudding user response:

This is not a sorting problem, you have to come up with a different algorithm. For example, there may be no solution at all, which would not be the case for a sorting problem.

Consider four points on a single line: A=(1, 1), B=(100, 1), C=(101, 1), D=(1000, 1). Point D is not the closest point for any other point, so it should come first. Then we should put C, followed by B, and now we cannot put A because the closest point to B is actually C, not A.

And even if it was, you should come up with a faster algorithm. You have N iterations of the for loop, each iteration looks for the smallest element among N other elements, which is already at least O(N^2). Using sort instead of min_element makes it even worse: O(N^2 log N). This won't fly for N ~ 100'000 at all in competitive programming/home assignments.

  • Related