Home > Blockchain >  Postgres sorting is not working on special characters using collate
Postgres sorting is not working on special characters using collate

Time:11-24

If I execute the postgresql script using collate, I am getting the result sorted only with '.' and it dint consider '__'. I want the special characters '.' and '_' to be sorted first followed by alphabets.

PostgreSQL:
SELECT * FROM test_spl_char ORDER BY CHR_NM COLLATE "C";
Output:
-----------
.GEMS
BISCUIT
CANDY
DONUT
YUM
_SHAKE

show lc_collate;
Output:
-----------
en_US.UTF-8

Kindly help me in getting the result sorted with special chars first.

CodePudding user response:

Actually it did take into account both . and _, and did so properly. It is your expectations that all special characters collate before standard alphanumerics that is in error; the special characters are interspersed within the letters and numbers. You can see this, from your data, by looking at the ASCII code (via Postgres function) for the first character of each:

select char_nm, ascii(substr(char_nm,1,1)) 
  from test_spl_char
  order by char_nm collate "C";

Notice that the result in fact sorted properly according the the ASCII code (the internal code).

You can get what you are looking for with a regular expression to determine the "class" and sorting on that then sorting on the actual value:

select * 
  from test_spl_char 
  order by char_nm ~ '^[A-Za-z0-9]' 
         , char_nm ; 

What it does: the regular expression char_nm ~ '^[A-Za-z0-9] checks the first character of the string then for an alphanumeric character. If it is returns True and it not False. Since Postgres considers false less than true special characters (non-alphanumeric characters) will collate lower and therefore first in ascending sequence. Then the full value is sorted also ascending.

See demo for both above.

  • Related