Home > database >  How MYSQL full text search works with special characters included using Natural Language Matches Pre
How MYSQL full text search works with special characters included using Natural Language Matches Pre

Time:06-17

I've been having a hard time trying to fix some bugs i've received over an articles search but i'm getting odd behaviors depending on the query i search and i've been trying fixes like wrapping the query in "", replacing the special character by spaces or splitting the text by spaces/closing quotes, bracets, parenthesis but it hasn't been too useful.

I've looked through a lot of pages/documentation and haven't completely understand how this search works. Here's a bit more context on the problem:

This Articles search uses a Natural Language Matches Predicate against the title and content. Both content and title could contain special characters, numbers, ips and even URLs, so, what is expected is that this search can return the best accurate results/exact results but this doesn't happen all the time and it varies on how the user types the text.

An example:

If i search by an entire article title, for example: Guess who's back - tl;dr: Emot at the top of the results i get the article matching with the title but i also get other results that seem to contain any of the words in the text i typed.

But if i search a fragment of the previous example: tl;dr: i do not get any results, any idea on why this happens? is there any internal confg that MYSQL text search applies when performing the search?

Something worth mention is that i do not know how the tables/indexes were configured, i do not have access to that kind of information, i'm just trying to understand how MYSQL works with this to be able to tell my manager that some behavior he and the customers expect will or won't be possible depending of what they're searching.

To anyone who can help me with this, thanks in advance.

CodePudding user response:

The starting point is mysql's documentation on Natural Language Full-Text Searches. The documentation is quite comprehensive.

  1. Matching of title and getting multiple results:

The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".

  1. Searching on tl;dr and not getting results is explained in two different places, the first describes what fulltext search considers a word, the second describes a further limitation on indexing too short words:

The MySQL FULLTEXT implementation regards any sequence of true word characters (letters, digits, and underscores) as a word. That sequence may also contain apostrophes ('), but not more than one in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two words. Apostrophes at the beginning or the end of a word are stripped by the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb. The built-in FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, (space), , (comma), and . (period).

Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is three characters for InnoDB search indexes, or four characters for MyISAM.

Based on what you described to me, you seem to be looking for more like exact substring matching (like operator), rather than what fulltext search is for.

  • Related