I have a table on a MySQL database that has two (relevant) columns, 'id' and 'username'.
I have read that MySQL and relational databases in general are not optimal for searching for near matches on strings, so I wonder, what is the industry practice for implementing simple, but not exact match, search functionalities- for example when one searches for accounts by name on Facebook and non-exact matches are shown? I found Apache Lucene when researching this, but this seems to be used for indexing pages of a website, not necessarily arbitrary strings in a database table.
Is there an external tool for this use case? It seems like any SQL query for this task would require a full scan, even if it was simply looking for the inclusion of a substring.
CodePudding user response:
In your situation I would recommend for you to use Elasticsearch instead of relational database. This search engine is a powerful tool for implementing search and analytics functionality. Elasticsearch also flexible and versatile, with a rich query language using JSON as query language and support for many different types of data.
And of course supports near-match searching. As you said, MySQL and anothers relational databases aren't recommended to use near-match searching, they aren't for this purpose.
To implement near-match searching in ElasticSearch you can use fuzzy matching query. The fuzzy matching query allows you to controls how lenient the matching should be, for example for this query bellow:
{
"query": {
"fuzzy": {
"username": {
"value": "julienambrosio",
"fuzziness": 2
}
}
}
}
They'll return "julienambrosio", such as "julienambrosio1", "julienambrosio12" or "juliembrosio".
You can adjust the level of fuzziness to control how lenient/strict the matching should be.
Before you create this example you should to study more about ElasticSearch. There're a lot of courses in udemy, youtube and etc.
You can read more about in the official docs.