Home > Software engineering >  Index to search in multiple tables
Index to search in multiple tables

Time:08-22

Problem

TL;DR My query is slow.

Im trying to find multiple customer by a given search-term "J". The fields im scanning are distributed among 2 tables customer and company_customer:

  • firstname begins with
  • lastname begins with
  • number begins with (this is the specific customer-number for a company/custumer combination)

I'm having a hard time optimizing the database and query (running mysql-8).

I've tested this with 10 companies and 100k customers each. The search took close to half a second - there has to be a way to get this faster.

The query

SELECT
    cc.number,
    c.firstname,
    c.lastname
FROM
    customer c
    JOIN company_customer cc
        ON cc.customer_id = c.id
WHERE
    cc.company_id = 1
    AND (
           c.lastname LIKE 'J%'
        OR c.firstname LIKE 'J%'
        OR cc.number LIKE 'J%'
    )
ORDER BY
    lastname,
    firstname
LIMIT 20;

The schema (simplyfied)

  • company (id, ...)
  • customer (id, ...)
  • company_customer (company_id, customer_id, number)

Fiddle

What I've tried

  • remove the actual search part to pinpoint the problem (every like) - still slow
  • get a list of customer.id via subselect - even slower
  • split the search - but when i search for customers i only want to search for those associated with a specific company_cusomer.company_id - so i cant get around the join right?
  • Find possible index by using EXPLAIN

What I've found out

I think the problem is the index for lastname, firstname is not being used since the JOIN occupies the index usage - since only one may be used.

The question

Is there a way to set an index to get the data faster? Or a way to rewrite the sql to get the data faster?

Edit

the explain (before with 1M rows in customers):

 ---- ------------- ------- ------------ -------- ------------------------- --------- --------- --------------------- -------- ---------- --------------------------------- 
| id | select_type | table | partitions | type   | possible_keys           | key     | key_len | ref                 | rows   | filtered | Extra                           |
 ---- ------------- ------- ------------ -------- ------------------------- --------- --------- --------------------- -------- ---------- --------------------------------- 
|  1 | SIMPLE      | cc    | NULL       | ref    | PRIMARY,IDX_co,IDX_cu   | PRIMARY | 4       | const               | 204966 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY                 | PRIMARY | 4       | test.cc.customer_id |      1 |   100.00 | NULL                            |
 ---- ------------- ------- ------------ -------- ------------------------- --------- --------- --------------------- -------- ---------- --------------------------------- 

the explain (before 1M rows):

 ---- ------------- ------- ------------ -------- ----------------------- --------- --------- ----------------- ------ ---------- ------------- 
| id | select_type | table | partitions | type   | possible_keys         | key     | key_len | ref             | rows | filtered | Extra       |
 ---- ------------- ------- ------------ -------- ----------------------- --------- --------- ----------------- ------ ---------- ------------- 
|  1 | SIMPLE      | c     | NULL       | index  | PRIMARY               | IDX_lf  | 804     | NULL            |   20 |   100.00 | Using index |
|  1 | SIMPLE      | cc    | NULL       | eq_ref | PRIMARY,IDX_co,IDX_cu | PRIMARY | 8       | const,test.c.id |    1 |   100.00 | NULL        |
 ---- ------------- ------- ------------ -------- ----------------------- --------- --------- ----------------- ------ ---------- ------------- 

CodePudding user response:

The OR, especially since it touches more than one table, is especially hard to optimize.

These may help:

cc:  INDEX(company_id, number, customer_id)
c:  INDEX(lastname, firstname, id)

The best optimization (especially for huge tables) would be to switch to UNION:

SELECT *
    FROM (
        ( SELECT ... with one of the LIKEs ...
              ORDER BY c.lastname, c.firstname LIMIT 20 )
        UNION DISTINCT
        ( SELECT ... with another of the LIKEs ...
              ORDER BY c.lastname, c.firstname LIMIT 20 )
        UNION DISTINCT
        ( SELECT ... with the other LIKE ...
              ORDER BY c.lastname, c.firstname LIMIT 20 )
         ) AS x
    ORDER BY lastname, firstname   -- again
    LIMIT 20      -- again
    ;

And more indexes (note reordering of columns:

cc:  INDEX(number, company_id, customer_id)
c:  INDEX(firstname, lastname, id)

(There may be a further optimization, but give this a try first.)

  • Related