For all Unicode collations except the _bin (binary) collations, MySQL performs a table lookup to find a character's collating weight.
For _bin collations except utf8mb4_0900_bin, the weight is based on the code point, possibly with leading zero bytes added.
For utf8mb4_0900_bin, the weight is the utf8mb4 encoding bytes. The sort order is the same as for utf8mb4_bin, but much faster.
How does utf8mb4_0900_as_cs fit into this? The page does not mention it.
CodePudding user response:
For the short answer, I quote your own quote:
For all Unicode collations except the _bin (binary) collations, MySQL performs a table lookup to find a character's collating weight.
utf8mb4_0900_as_cs is not _bin. So it does the same as all other collations.
For the slightly longer answer, I assume that you assume that _as_cs and _bin behave the same as they both are case sensitive and care about accents.
That is not the case. For example, even if umlauts are not treated as equal in _as_cs (just as in _bin), they are still not ordered by their (almost random) byte value, but based on what they are an umlaut of (or rather, as specified in the unicode standard).
E.g. for _as_cs, you get
A < Á < À < Ä < AE < Æ < B
while for _bin, you get
A < AE < B < À < Á < Ä < Æ
To be able to do this, MySQL uses the same techniques as for _ai_ci, you just get a different position than in _as_cs. Only for _bin, MySQL can simply take the byte value as is.