We have a byte field that we save data to using the convert_to function. To extract data, use convert_from, respectively. But we got some problems with Cyrillic in PG12. Example:
select convert_from(convert_to('Берёза', 'UTF8'), 'UTF8') ilike 'берёз%' --return false
But everything works well with the Latin. Example:
select convert_from(convert_to('PostgreSQL', 'UTF8'), 'UTF8') ilike 'postgre%' --return true
The problem appeared in version 12, it can be seen, for example, on this resource https://dbfiddle.uk/ by selecting the appropriate version PG in the menu. Can anyone help sort out the problem?
CodePudding user response:
The problem is that the result of convert_from
is in the C
collation:
On my system, I have the following:
SHOW lc_collate;
lc_collate
════════════
de_AT.utf8
(1 row)
SELECT pg_collation_for('Берёза');
pg_collation_for
══════════════════
(null)
(1 row)
The NULL value means that the default collation (lc_collate
) will be used for comparisons:
SELECT 'Берёза' ilike 'берёз%';
?column?
══════════
t
(1 row)
Now if we use convert_from
, the collation is different, so the comparison is different:
SELECT pg_collation_for(convert_from(convert_to('Берёза', 'UTF8'), 'UTF8'));
pg_collation_for
══════════════════
"C"
(1 row)
select convert_from(convert_to('Берёза', 'UTF8'), 'UTF8') ilike 'берёз%';
?column?
══════════
f
(1 row)
SELECT 'Берёза' COLLATE "C" ilike 'берёз%';
?column?
══════════
f
(1 row)
So you can solve the problem by explicitly specifying the default collation:
select convert_from(convert_to('Берёза', 'UTF8'), 'UTF8') COLLATE "default" ilike 'берёз%';
?column?
══════════
t
(1 row)