In order to sort a PostgreSQL 14 varchar column with numerical values, I'm using a collation created by
CREATE COLLATION de_pos (LOCALE = 'de-DE-u-kn-true', PROVIDER = 'icu');`
With such a collation ORDER BY results in correct order like this
1.2.10
1.2.20
1.2.100
Without that special collation ORDER BY results in
1.2.10
1.2.100
1.2.20
I want to know, what every part of that collation definition exactly means ?
de-DE => locale for german-germany
u => UTF8 ????
kn => ????
true => ????
CodePudding user response:
See ICU Collations in the PostgreSQL documentation. This links to the ICU documentation, which - with some indirection - leads to Unicode Locale Identifier, which makes clear that the -u
introduces the Unicode Locale Extensions, and kn
is one of those extensions. When you look at Collation Settings, you'll find kn
configures numeric ordering. The true
is the configuration of that option (meaning, numeric ordering is on):
If set to on, any sequence of Decimal Digits (General_Category = Nd in the [UAX44]) is sorted at a primary level with its numeric value. For example, "A-21" < "A-123". The computed primary weights are all at the start of the digit reordering group. Thus with an untailored UCA table, "a$" < "a0" < "a2" < "a12" < "a⓪" < "aa".
This is sometimes called “natural sort order”.
In other words, de-DE-u-kn-true
is:
de
: language GermanDE
: region Germanyu
: what follows are Unicode Locale Extensionkn
: Unicode Locale Extension numeric orderingtrue
: value ofkn
, meaning numeric ordering is on
CodePudding user response:
You can customize your collation's sorting behavior by specifying attribute names. In this case, kn-true stands for natural sorting order or numeric sort order that sorts numbers based on their numeric value.