Home > Net >  How can I make "euclidean distance calculation" faster in MySQL?
How can I make "euclidean distance calculation" faster in MySQL?

Time:05-01

I am creating a face recognition system, but the search is very slow. Can you share how to speed up the search?

It takes about 6 seconds for 100,000 data items.

  • MySQL
mysql> SHOW VARIABLES LIKE '%version%';
 -------------------------- ------------------------------ 
| Variable_name            | Value                        |
 -------------------------- ------------------------------ 
| version                  | 8.0.29                       |
| version_comment          | MySQL Community Server - GPL |
| version_compile_machine  | x86_64                       |
| version_compile_os       | Linux                        |
 -------------------------- ------------------------------ 
  • Table
CREATE TABLE `face_feature` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `f1` decimal(9,8) NOT NULL,
  `f2` decimal(9,8) NOT NULL,
  ...
  ...
  `f127` decimal(9,8) NOT NULL,
  `f128` decimal(9,8) NOT NULL,
  PRIMARY KEY (id)
);
  • Data
mysql> SELECT count(*) FROM face_feature;
 ---------- 
| count(*) |
 ---------- 
|   110004 |
 ---------- 

mysql> SELECT * FROM face_feature LIMIT 1\G;
  id: 1
  f1: -0.07603023
  f2: 0.13605964
  ...
f127: 0.09608927
f128: 0.00082345
  • SQL
SELECT 
  id,
  sqrt(
    power(f1 - (-0.09077361), 2)  
    power(f2 - (0.10373443), 2)  
    ...
    ...
    power(f127 - (0.0778369), 2)  
    power(f128 - (0.00951046), 2)
  ) as distance
FROM 
  face_feature
ORDER BY 
  distance
LIMIT
  1
;
  • Result
 ---- -------------------- 
| id | distance           |
 ---- -------------------- 
|  1 | 0.3376853491771237 |
 ---- -------------------- 
1 row in set (6.18 sec)

Update 1:

Changed from decimal(9,8) to float(9,8)

Then, improved from approximately 4sec to 3.26 sec

mysql> desc face_feature;
 ------- ------------ ------ ----- --------- ---------------- 
| Field | Type       | Null | Key | Default | Extra          |
 ------- ------------ ------ ----- --------- ---------------- 
| id    | int        | NO   | PRI | NULL    | auto_increment |
| f1    | float(9,8) | NO   |     | NULL    |                |
| f2    | float(9,8) | NO   |     | NULL    |                |
..
| f127  | float(9,8) | NO   |     | NULL    |                |
| f128  | float(9,8) | NO   |     | NULL    |                |
 ------- ------------ ------ ----- --------- ---------------- 

Update 2:

Changed from POWER(z, 2) to z*z

Then, the result was changed from 3.26 sec to 4.65 sec

SELECT 
  id,
  sqrt(
    ((f1 - (-0.09077361)) * (f1 - (-0.09077361)))  
    ((f2 - (0.10373443)) * (f2 - (0.10373443)))  
    ((f3 - (0.00798536)) * (f3 - (0.00798536)))  
    ...
    ...
    ((f126 - (0.07803915)) * (f126 - (0.07803915)))  
    ((f127 - (0.0778369)) * (f127 - (0.0778369)))  
    ((f128 - (0.00951046)) * (f128 - (0.00951046))
  ) as distance
FROM 
  face_feature
ORDER BY 
  distance
LIMIT
  1
;

Update 3

I am looking into the usage of MySQL GIS.

How can I migrate from "float" to "points" in MySQL?


Update 4

I'm also looking at PostgreSQL because I can't find a way to handle 128 dimensions in MySQL.

CodePudding user response:

  • DECIMAL(9,8) -- that's a lot of significant digits. Do you need that much precision?

  • FLOAT -- about 7 significant digits; faster arithmetic.

  • POWER(z, 2) -- probably a lot slower than z*z. (This may be the slowest part.)

  • SQRT -- In many situations, you can simply work with the squares. In this case:

    SELECT SQRT(closest)
        FROM ( SELECT -- leave out SQRT
                 ... ORDER BY .. LIMIT 1 )
    

Here are some other thoughts. They are not necessarily relevant to the query being discussed:

  • Precise testing -- Beware of comparing for 'equal' Roundoff error is likely to make things unequal unexpectedly. Imprecise measurements add to the issue. If I measure something twice, I might get 1.23456789 one time and 1.23456788 the next time. (Especially at that level of "precision".

  • Trade complexity vs speed -- Use ABS(a - b) as the distance formula; find the 10 items closest in that way, then use the Euclidean distance to get the 'right' distance.

  • Break the face into regions. Find which region the item is in, then check only the subset of the 128 points that are in that region. (Being near a boundary -- put some points in two regions.)

  • Think out of the box -- I'm not familiar with your facial recognition, so I have run out of mathematical tricks.

  • Switch to POINTs and a SPATIAL index. It may be possible your task orders of magnitude faster. (This is probably not practical for 128-dimensional space.)

  • Related