Search for "ilike" does not work after converting bytea to a text field in PostgreSQL ver.-CodePudding

We have a byte field that we save data to using the convert_to function. To extract data, use convert_from, respectively. But we got some problems with Cyrillic in PG12. Example:

select convert_from(convert_to('Берёза', 'UTF8'), 'UTF8') ilike 'берёз%' --return false

But everything works well with the Latin. Example:

select convert_from(convert_to('PostgreSQL', 'UTF8'), 'UTF8') ilike 'postgre%' --return true

The problem appeared in version 12, it can be seen, for example, on this resource https://dbfiddle.uk/ by selecting the appropriate version PG in the menu. Can anyone help sort out the problem?

CodePudding user response：

The problem is that the result of convert_from is in the C collation:

On my system, I have the following:

SHOW lc_collate;

 lc_collate 
════════════
 de_AT.utf8
(1 row)

SELECT pg_collation_for('Берёза');

 pg_collation_for 
══════════════════
 (null)
(1 row)

The NULL value means that the default collation (lc_collate) will be used for comparisons:

SELECT 'Берёза' ilike 'берёз%';

 ?column? 
══════════
 t
(1 row)

Now if we use convert_from, the collation is different, so the comparison is different:

SELECT pg_collation_for(convert_from(convert_to('Берёза', 'UTF8'), 'UTF8'));

 pg_collation_for 
══════════════════
 "C"
(1 row)

select convert_from(convert_to('Берёза', 'UTF8'), 'UTF8') ilike 'берёз%';

 ?column? 
══════════
 f
(1 row)

SELECT 'Берёза' COLLATE "C" ilike 'берёз%';
 ?column? 
══════════
 f
(1 row)

So you can solve the problem by explicitly specifying the default collation:

select convert_from(convert_to('Берёза', 'UTF8'), 'UTF8') COLLATE "default" ilike 'берёз%';

 ?column? 
══════════
 t
(1 row)