Home > Mobile >  How to do geo spatial searching in plain mysql
How to do geo spatial searching in plain mysql

Time:09-27

Given a bounding box of southwest(lng, lat) and northeast(lng,lat), I want to find out all points that falls within this given region. The table is currently designed as follow:

CREATE TABLE IF NOT EXISTS steps (
   id int NOT NULL AUTO_INCREMENT,
   rid int NOT NULL COMMENT 'route ID',
   seq int NOT NULL COMMENT 'sequence',
   longitude decimal(10,7) NOT NULL,
   latitude decimal(10,7) NOT NULL,
   PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

My question is: how to let mysql search faster. So,

  1. Since mysql only uses one index per query, so it seems not efficient to simply add index for latitude and longitude?
  2. I cannot use MySQL Geo Spatial Extension, any solution must be native MySQL.
  3. Since the app is to display "routes" within the view port, it is necessary to get points adjucent to, but outside of, the given viewport.
  4. Will it help to use a solution such as geohash? If so, how can I adapt the proper length of geohash in relation to the given viewport?

EDIT

  1. I cannot use geo-spatial feature of MySQL, as the app runs on a MySQL 5.7, which I do not have administrative right, even I could arrange to add this extension (if 5.7 support it), it is not favorable to us because we do not want to introduce any incompatibility to other part of our system.

  2. The application is to show "historical" routes which are highly "clustered" i.e, as time goes by, there may be a lot of close-together routes, in several sites, each sites may occupy e.g. 1k~2k square kilometers. So, apart from what I already asked, another question might be: how to eliminate closed-together routes, if some are mostly covered up by other routes.

CodePudding user response:

Most of your query efficiency will come from an index on just (longitude) or just (latitude). But a compound index on both (latitude, longitude) will make things a little faster. Why? MySQL can retrieve the second coordinate value, which it must also check against your boundaries, directly from the index without looking it up in the table. That saves time and IO in your MySQL server.

Unless your table is vast, or your bounding boxes are really big, or you have some other application performance issue, this will work tolerably well. If it doesn't you probably should start using the geospatial extension.

What else will help?

  1. Changing the data type of your latitude and longitude from DECIMAL to FLOAT (single precision). GPS data can be adequately represented by single-precision floating point, and the comparisons and arithmetic are slightly faster. (If your data are more precise than FLOAT allows, surely you know about your geodetic datum and the projection you use.)

  2. If your data points are spread out mostly east-to-west (USA for example) use an index on (longitude, latitude) because longitude is more selective. If they are spread out mostly north-to-south (Japan for example) reverse the order of the columns in the index: latitude is more selective.

  3. Make your compound index match your query. If your query looks like this

    SELECT rid, seq, latitude, longitude
      FROM steps
     WHERE rid = ###constant##
       AND latitude BETWEEN ###southboundary### AND ###northboundary###
       AND longitude BETWEEN ###westboundar### AND ###eastboundary###
     ORDER BY seq
    

    your best choice of covering index is (rid, latitude, longitude, seq).

I think your requirement is to get all the routes that pass through your bounding box. That you can do with this query.

SELECT rid, seq, latitude, longitude
  FROM steps
 WHERE rid IN (
      SELECT rid
        FROM steps
       WHERE latitude BETWEEN ###southboundary### AND ###northboundary###
       AND longitude BETWEEN ###westboundar### AND ###eastboundary###
)
ORDER BY rid, seq

A good covering index for the subquery will be (latitude, longitude, rid).

How well will this scale up? If your bounding boxes are small compared to the range of lat/long values in your table it will scale up quite well: BETWEEN filters on indexed columns are a highly optimized range-scan use case. It is difficult to say 100k rows or 10m rows without knowing a lot more about your application. You should read https://use-the-index-luke.com by Marcus Winand to learn more about the wizardry of indexing.

CodePudding user response:

Have these:

INDEX(lat, lng),
INDEX(lng, lat)

And DROP these if you have them: INDEX(lat), INDEX(lng); they get in the way.

The Optimizer will use either of those 2-column indexes, depending which one seems to have fewer rows in the E-W or N-S stripe of the globe.

Why can you not use SPATIAL indexing? It is available in 8.0.

Here is a thorough discussion of the problem of "find nearest" in MySQL. It includes a discussion of precision needed for lat/lng. http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

That also goes into algorithms that are faster than a simple bounding box.

My blog goes into "Z-order" indexing, which works something like gohash. I have looked into using a Hilbert space-filling curve, too. That promises to have similar performance to the Z-order, but with quite different code.

The partitioning and Z-order solutions hit "not many more" items than are needed in the resultset. The 2-column indexes hit a lot more items since they cover a whole stripe of latitude (or longitude).

What does your SELECT look like?

As for variable length of geohash, I doubt it. I looked into that for Z-order and concluded that I needed more than 32 bits and not more than 64. Otherwise, the algorithm will falsely place some items inside or outside the bounding box.

A Spatial POINT takes a bulky 25 bytes (compared to 12 for your pair of Decimals and some smaller representations mentioned in my blog.) But I don't see why you could not have used Point. For vehicles, I would pick FLOAT (8 bytes, 1.7 m / 5.6 ft resolution). For persons: DECIMAL(8,6)/(9,6) (9 bytes, 16cm / 1/2 ft).

  • Related