I have a database with the names of German towns and cities such as Munich and Münster.
If I query like so:
SELECT name,
MATCH(name) AGAINST(' mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY relevance DESC
I get the same relevance value for all places that contain mun
, mün
or any other text that flattens to mun
when not accounting for diacritics. In other words, searching for mun
or mün
gives the exact same results.
How can I configure my database such that searching for mün
will give higher relevance for words that actually contain the letter ü
, but still also consider u
a match?
CodePudding user response:
CREATE TABLE place_names (id SERIAL PRIMARY KEY, name VARCHAR(255)); CREATE FULLTEXT INDEX idx ON place_names (name); INSERT INTO place_names (name) VALUES ('Munich'), ('Münster'); SELECT * FROM place_names;
id name 1 Munich 2 Münster
SELECT name, MATCH(name) AGAINST(' mün*' IN BOOLEAN MODE) AS relevance FROM place_names ORDER BY relevance DESC;
name relevance Munich 0.000000001885928302414186 Münster 0.000000001885928302414186
ALTER TABLE place_names ADD COLUMN name2 VARCHAR(255) COLLATE utf8mb4_0900_bin AS (name) STORED;
CREATE FULLTEXT INDEX idx2 ON place_names (name2);
SELECT name, MATCH(name) AGAINST(' mün*' IN BOOLEAN MODE) AS relevance, MATCH(name2) AGAINST(' mün*' IN BOOLEAN MODE) AS relevance2 FROM place_names ORDER BY relevance DESC;
name relevance relevance2 Munich 0.000000001885928302414186 0 Münster 0.000000001885928302414186 0.0906190574169159
db<>fiddle here
Hence
SELECT name,
MATCH(name) AGAINST(' mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY MATCH(name2) AGAINST(' mün*' IN BOOLEAN MODE) DESC;