Problem

I have a list of approximatly 200000 nodes that represent lat/lon position in a city and I have to compute the Minimum Spanning Tree. I know that I need to use Prim algorithm but first of all I need a connected graph. (We can assume that those nodes are in a Euclidian plan)

To build this connected graph I thought firstly to compute the complete graph but (205000*(205000-1)/2 is around 19 billions edges and I can't handle that.

Options

Then I came across to Delaunay triangulation: with the fact that if I build this "Delauney graph", it contains a sub graph that is the Minimum Spanning Tree according and I have a total of around 600000 edges according to Wikipedia [..]it has at most 3n-6 edges. So it may be a good starting point for a Minimum Spanning Tree algorithm.

Another options is to somewhat build an approximate connected graph but with that I will maybe miss important edges that will influence my Minimum Spanning Tree.

My question

Is Delaunay a reliable solution in this case? If so, is there any other reliable solution than delaunay triangulation to this problem ?

Further information: this problem has to be solved in C.

CodePudding user response：

The Delaunay triangulation of a point set is always a superset of the EMST of these points. So it is absolutely "reliable"*. And recommended, as it has a size linear in the number of points and can be efficiently built.

*When there are cocircular point quadruples, neither the triangulation nor the EMST are uniquely defined, but this is usually harmless.

CodePudding user response：

This is more "thoughts" than "answer" but it doesn't fit in a comment so oh well.

There's a big question here of what libraries you have access to and how much you trust yourself as a coder. (I'm assuming the fact that you're new on SO should not be taken as a measure of your overall experience as a programmer - if it is, well, RIP.)

If we assume you don't have access to Delaunay and can't implement it yourself, minimum spanning trees algorithms that pre-suppose a graph aren't necessarily off limits to you. You can have the complete graph conceptually but not actually. Kruskal's algorithm, for instance, assumes you have a sorted list of all edges in your graph; most of your edges will not be near the minimum, and you do not have to compare all n^2 to find the minimum.

You can find minimum edges quickly by estimations that give you a reduced set, then refinement. For instance, if you divide your graph into a 100/100 grid, for any point p in the graph, points in the same grid square as p are guaranteed to be closer than points three or more squares away. This gives a much smaller set of points you have to compare to safely know you've found the closest.

It still won't be easy, but that might be easier than Delaunay.