Home > front end >  Is Java case-sensitive string sorting broken?
Is Java case-sensitive string sorting broken?

Time:05-19

I’ve tried to case-sensitive sort some strings in Java and I was quite surprised by the result. Here is the code:

List<String> words = List.of("aLocalized", "aaLocalized", "aaaLocalized", "ALocalized", "AALocalized", "AAALocalized");
System.out.println(words.stream().sorted().collect(Collectors.joining(" ")));

And here is the result: AAALocalized AALocalized ALocalized aLocalized aaLocalized aaaLocalized

To me this doesn’t look right. Why for small letters a comes before aa and aa comes before aaa, but for capital letters AAA comes before AA and AA comes before A?

CodePudding user response:

They’re sorted by the numeric codepoint of each character.

Each character in your Strings has a numeric codepoint, which is specified by the Unicode specification. The ordering of the first 128 codepoints of Unicode is called ASCII. As you can see in that table, uppercase letters have lower codepoint numbers than lowercase characters, so uppercase letters always come before lowercase letters when sorting.

For example, A is codepoint 41 (in hexadecimal), while a is 61. So A comes before a.

If you want your strings to be sorted in dictionary order, use a Collator:

System.out.println(
    words.stream().sorted(Collator.getInstance())
        .collect(Collectors.joining(" ")));

CodePudding user response:

What you maybe fail to realize is that all capital letters come before small letters. So a List.of("A", "a", "B", "b", "C", "c"); would result in A B C a b c

  • Related