I have a huge Geo database that I frequently need to compare with real time Geo data in order to determine nearest location per Latitude:Longitude. The location table does hold a number of rows, but is rarely added with new records. Determining nearest location against millions of real time data is painfully costing us with super slow queries, even after implementing a rectangular distance comparison algorithm (than actually comparing by Haversine).
I want to convert this comparison with a DETERMINISTIC function that should really bring up a real performance boost with static results.
However, I want MySQL to reset/rebuild this deterministic result cache every week. Like, I want MySQL to return me the same result for a Latitude:Longitude pair comparing against my location table, but for 7 days. After 7 days, there is a good chance I might add new locations to that table, and I want MySQL to start rebuilding that deterministic function result cache considering new rows been added to that table, preferably without restarting MySQL server.
Note: A MariaDB compliant solution is a serious good to have :)
Correction: Please forgive me to use that term with MySQL. So far I could understand, the result does not change for a deterministic function where all input are the same, this allures me to think MySQL does not actually tend to execute or process the instruction inside the function, rather returns the previously calculated value for the same set of input values, so, definitely it does cache the values somewhere (I don't know where), thus behaving something like just looking up through a list or something like that. I think I overloaded the OPTIMIZER with CACHE here :(
CodePudding user response:
I think you misunderstand what DETERMINISTIC
means as an option to CREATE FUNCTION
.
It does not mean the result of the function is memoized. There is no cache of function results. There is no command to refresh this cache, because the results are not kept.
The meaning of DETERMINISTIC
mainly affects binary logging:
https://mariadb.com/kb/en/create-function/#not-deterministic
The
[NOT] DETERMINISTIC
clause also affects binary logging, because theSTATEMENT
format can not be used to store or replicate non-deterministic statements.
That is, a non-deterministic function may return different results if it is executed on a replica, so if you use the function in an SQL statement that modifies data, the binary log format for that event must be ROW
to ensure the same change is applied on the replica.
There's also a vague reference:
The optimizer may choose a faster execution plan if it known that the function is deterministic.
But no example is given for such a case. This is likely to be a rarity for it to make a significant performance difference.
I don't think this will be an effective performance optimization for your use case.
CodePudding user response:
The Function is not where you can get performance. (Bill discussed that.) The algorithm is where you can gain performance. (I will go into that.)
The simple-minded choice involves checking the distance each of the million rows. As you have found, this is not scalable.
Next comes INDEX(lat, lng), INDEX(lng, lat)
and using a bounding box. That gives an order of magnitude speedup.
A similar improvement comes from using SPATIAL
values and index.
I go into detail about those 3 approaches in Find Nearest . That link also gives two algorithms that are another order of magnitude faster, but you may not need that much improvement (and complexity) for only a few million rows.