Home > database >  FULLTEXT Indexes for speed, SQL database design
FULLTEXT Indexes for speed, SQL database design

Time:03-30

I have ten million records. MySQL database looks like this:

tagline: varchar(255)   
location: varchar(80)   
experience: text (normally about 600 characters)
country varchar(50) 

Previously, I was executing queries like this, which resulted in full table scans, which take too long:

SELECT COUNT(*) FROM `mytable` WHERE
`tagline` LIKE '%sales%'
AND `location` LIKE '%texas%'
AND `experience` LIKE '%software%'

People here tell me that I should apply FULLTEXT indexes to each field. They tell me that I should be executing queries like this:

SELECT COUNT(*) FROM `mytable` WHERE
MATCH(tagline) AGAINST("sales")
AND MATCH(location) AGAINST("texas")
AND MATCH(experience) AGAINST("software")

I just want to confirm that:

(a) The results should be the same

(b) The speed will be faster

(c) the only downside will be the increased size of the database and the time it will take my system to implement the FULLTEXT indexes on each field.

Please don't accuse me of being too specific. Just providing my setup as an example. Anybody who wants search functionality and has a lot of text data will benefit from the replies.

CodePudding user response:

As mentioned in my comment above, if you index each of the three columns separately, a MySQL query must choose one index per table reference. It cannot use all three indexes in the same query, unless you use different table references.

You should create a single fulltext index for all three columns:

ALTER TABLE mytable ADD FULLTEXT INDEX(tagline, location, experience);

Then search with a single predicate:

SELECT COUNT(*) 
FROM `mytable`
WHERE MATCH(tagline, location, experience) 
  AGAINST(" sales  texas  software" IN BOOLEAN MODE);

But this loses the association of which keyword is found in each column. You could then apply your old-school LIKE conditions to refine the search, and these would only need to examine the rows matched by the fulltext search.

SELECT COUNT(*) 
FROM `mytable`
WHERE MATCH(tagline, location, experience) 
  AGAINST(" sales  texas  software" IN BOOLEAN MODE)
AND tagline LIKE '%sales%'
AND location LIKE '%texas%'
AND experience LIKE '%software%';

The costly part is when a query has to do a table-scan to examine millions of rows. If you use the fulltext index to narrow down the candidate matches to a few rows, the extra string comparison of using LIKE against a small subset of rows probably won't be too costly.


Re your comment:

Here's what I get when I run EXPLAIN on a test table:

mysql> create table mytable (
  id serial primary key, 
  tagline text, 
  location text, 
  experience text, 
  fulltext index(tagline, location, experience) 
);

mysql> explain SELECT COUNT(*) 
    -> FROM `mytable`
    -> WHERE MATCH(tagline, location, experience) 
    ->   AGAINST(" sales  texas  software" IN BOOLEAN MODE)
    -> AND tagline LIKE '%sales%'
    -> AND location LIKE '%texas%'
    -> AND experience LIKE '%software%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mytable
   partitions: NULL
         type: fulltext
possible_keys: tagline
          key: tagline
      key_len: 0
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where; Ft_hints: no_ranking

The type: fulltext indicates it is going to use the fulltext index.

  • Related