Home > Blockchain >  How to optimize a query in a one to many scenario when i have a fulltext index on a column?
How to optimize a query in a one to many scenario when i have a fulltext index on a column?

Time:11-07

My paintings table looks like this

|  id  | artist_id |    name 
|  1   |     7     |  landscape painting
|  2   |     15    |  flowers painting
|  3   |     15    |  scuffed painting

The artist_id is indexed and the name has a fulltext index on it. The table contains about 10M record.

Queries that match the name agains some keywords are ok:

select * from `paintings` where match (`name`) against (' scuffed*' in boolean mode) limit 10;
10 rows in set (0.04 sec)

But when I sometimes want to only check for a certain painting done by a certain artist:

select * from `paintings` where `artist_id` = 15 and match (`name`) against (' scuffed*' in boolean mode) limit 10;
7 rows in set (0.40 sec)

As you can see it takes 10x longer to run the query when I include the artist_id. I also tried running a nested query in order to get only paintings that have specific ids:

select * from `paintings` where id in (SELECT id from paintings where artist_id = 15) and match (`name`) against (' scuffed*' in boolean mode) limit 10;
7 rows in set (0.44 sec)

This ended up being even slower.

How can this query be optimized to work well with and without a where clause on the artist_id?

Thank you!

CodePudding user response:

You need to create a COMPOSITE INDEX KEY consisting of columns (id and artist_id) to speed up your query:

mysql> ALTER TABLE paintings ADD INDEX cmp_id_artist_id (id, artist_id);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql> SHOW INDEXES FROM paintings;
 ----------- ------------ ------------------ -------------- ------------- ----------- ------------- ---------- -------- ------ ------------ --------- --------------- --------- ------------ 
| Table     | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
 ----------- ------------ ------------------ -------------- ------------- ----------- ------------- ---------- -------- ------ ------------ --------- --------------- --------- ------------ 
| paintings |          1 | cmp_id_artist_id |            1 | id          | A         |           3 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| paintings |          1 | cmp_id_artist_id |            2 | artist_id   | A         |           3 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| paintings |          1 | ftindex_name     |            1 | name        | NULL      |           3 |     NULL |   NULL | YES  | FULLTEXT   |         |               | YES     | NULL       |
 ----------- ------------ ------------------ -------------- ------------- ----------- ------------- ---------- -------- ------ ------------ --------- --------------- --------- ------------ 
3 rows in set (0.00 sec)

And now you can test again your 2nd query:

mysql> select * from `paintings` where `artist_id` = 15 and match (`name`) against (' scuffed*' in boolean mode) limit 10;
 ---- ----------- ------------------ 
| id | artist_id | name             |
 ---- ----------- ------------------ 
|  3 |        15 | scuffed painting |
 ---- ----------- ------------------ 
1 row in set (0.00 sec)

CodePudding user response:

Run EXPLAIN SELECT ... to see how the query is performed.

I think your second query is as optimal is can be. MySQL will perform the MATCH first, then check any other conditions.

You could add INDEX(artist_id), but I don't think that will help.

More

Let me provide another approach, then talk through the pros an cons.

PRIMARY KEY(artist_id, id),
INDEX(id),
FULLTEXT(name),

-- When searching _only_ by `name`:
WHERE MATCH(name) AGAINST(' string' IN BOOLEAN MODE)

-- When searching by artist _and_ name:
WHERE artist_id = 123
  AND name LIKE '%string%';  

Comments:

  • Without the artist, you get the full speed of FULLTEXT.
  • With artist, you abandon FULLTEXT (because it seems not to work for your dataset) and switch to the composite index, plus need to look only at those rows with the given artist_id.
  • id needs to be indexed to keep AUTO_INCREMENT happy; a simple INDEX(id) suffices.
  • The PRIMARY KEY must be unique; including id suffices.
  • Starting the PK with `artist_id clusters the rows for a given artist together, thereby speeding the query up (some for small tables, a lot for big tables).
  • This requires that you create two different queries, but that is probably not a big problem.
  • For an artist that has a lot of works, it may be 'too' slow -- because it is failing to use FULLTEXT.
  • FULLTEXT and LIKE have different rules for what will/won't match. So you may get different answers.
  • Another recent Question discussed why AGAINST('... the ...' IN BOOLEAN MODE) never finds any rows.
  • AGAINST(" color purple") will find purple color, but LIKE '%color purple% won't.
  • Be careful about benchmarking any approaches -- the performance will depend on how many works an artist has, and many other un-obvious differences.
  • Related