Home > database >  PostgreSQL case-insensitive and accent-insensitive search
PostgreSQL case-insensitive and accent-insensitive search

Time:03-30

I have a data table and I would like to filter the columns. For example, to search for a user by his fullname.

However, I would like to allow the user to enter the search phrase case-independent and accents-independent.

So I have checked these (and more) sources and questions:

I thought the nondeterministic collations might finally be the right way how to achieve that, but unfortunately I don't know how:

  • combine case_insensitive and ignore_accents into one collation
  • how to allow searching only by substring in such a WHERE (e.g., find "Jóhn Doe" only by the string "joh") as nondeterministic collations do not support LIKE or regular expressions
  • which index to use

I would be very grateful for any advice on how to finally deal with this type of problem.

Thanks!

CodePudding user response:

Creating case and accent insensitive ICU collations is pretty simple:

CREATE COLLATION english_ci_ai (
   PROVIDER = icu,
   DETERMINISTIC = FALSE,
   LOCALE = "en-US-u-ks-level1"
);

Or, equivalently (that syntax also works wil old ICU versions:

CREATE COLLATION english_ci_ai (
   PROVIDER = icu,
   DETERMINISTIC = FALSE,
   LOCALE = "en-US@colStrength=primary"
);

See the ICU documentation for details and my article for a detailed discussion.

But your problem is that you want substring search. So you should create a trigram index:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS unaccent;

CREATE INDEX ON tab USING gin (unaccent(doc) gin_trgm_ops);

Then you can search like this:

SELECT * FROM tab
WHERE unaccent(doc) ILIKE unaccent('%joh%');

Note that you have to force a minimal length of 4 or so on the search string if you want that to be efficient.

  • Related