Home > Net >  Would indexing a text field allow for better performance during a regex search?
Would indexing a text field allow for better performance during a regex search?

Time:12-20

So I have a blog posts table with a title field and I would like for users to be able to query posts via this field. For this, constructed a query where the string the user types into the search bar could be anywhere in the title field like so:

SELECT id, title, body FROM posts
   WHERE title LIKE "%users_query%";

My question is could indexing the title field enhance the performance of the query? My thinking is at scale when there can be millions of posts, this search would take a very long time and thus indexing the field would decrease response time.

I have taken a look at this post and this post and they say that indexing won't really make a difference. However they are quite dated so perhaps over time this has become a viable solution.

If this still won't work all too well, what could I potentially do to speed up such queries.

Side question: I am also ordering these posts by a created_at field descending. Should I index that field as well?

Thank you!

CodePudding user response:

There are three circumstances where indexing might improve that query.

  1. If there is no initial wildcard % in the Where. If your Where clause said WHERE title LIKE "users_query%", then there's a chance you could use the index to pull out only the rows beginning with 'users_query.
  2. If you have many rows with the same title. The index would have fewer rows to run the wildcard against than the table.
  3. If the table has an exceptionally large number of columns, it's possible that reading the index would result in a lot less I/O.

Otherwise, yes, it's not worth it.

CodePudding user response:

Indexing the title probably won't help. The leading % in your query means the search term can appear anywhere in the title, so there is no way for the database engine to seek to the search term.

You can think about it like this - imagine you have the following index on some address:

address
163 pExample Street
276 dExample Street
373 aExample Street
490 tExample Street
512 bExample Street
...

Now if someone asked you to find addresses on bExample Street, you'd have no choice but to scan the entire index.

CodePudding user response:

LIKE with a leading wildcard (and any REGEX) will check every row. No indexing helps.

INDEX(created_at) together with ORDER BY created_at DESC LIMIT 5 will help only if it finds 5 matching rows created recently. Otherwise, it won't help.

Anyway, INDEX cannot handle a TEXT columns. And INDEX(col(255)), though possible, limits the search to the first 255 characters.

A FULLTEXT index, together with MATCH(...) AGAINST(...) can help a lot. But there are caveats. Study the manual, then we can discuss some of its difficulties and workarounds.

Limiting an index to just (title), not (title, msg) may help some, but not as much as you would like.

  • Related