I am trying to alphabetically sort an array of non-English strings which contain a number of special Unicode characters. I can create a CharacterSet sequence which contains the desired lexicographic sort order.
Is there an approach in Swift5 to performing this type of customized sort? I believe I saw such a function some years back, but a pretty exhaustive search today failed to turn anything up.
Any pointers would be appreciated!
CodePudding user response:
As a simple implementation of matt's cosorting comment:
// You have `t` twice in your string; I've removed the first one.
let alphabet = "ꜢjiyꜤwbpfmnRrlhḥḫẖzsšqkgtṯdḏ "
// Map characters to their location in the string as integers
let order = Dictionary(uniqueKeysWithValues: zip(alphabet, 0...))
// Make the alphabet backwards as a test string
let string = alphabet.reversed()
// This sorts unknown characters at the end. Or you could throw instead.
let sorted = string.sorted { order[$0] ?? .max < order[$1] ?? .max }
print(sorted)
CodePudding user response:
Rather than building your own “non-English” sorting, you might consider localized comparison. E.g.:
let strings = ["a", "á", "ä", "b", "c", "d", "e", "é", "f", "r", "s", "ß", "t"]
let result1 = strings.sorted()
print(result1) // ["a", "b", "c", "d", "e", "f", "r", "s", "t", "ß", "á", "ä", "é"]
let result2 = strings.sorted {
$0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}
print(result2) // ["a", "á", "ä", "b", "c", "d", "e", "é", "f", "r", "s", "ß", "t"]
let locale = Locale(identifier: "sv")
let result3 = strings.sorted {
$0.compare($1, options: .caseInsensitive, locale: locale) == .orderedAscending
}
print(result3) // ["a", "á", "b", "c", "d", "e", "é", "f", "r", "s", "ß", "t", "ä"]
And a non-Latin example:
let strings = ["あ", "か", "さ", "た", "い", "き", "し", "ち", "う", "く", "す", "つ", "ア", "カ", "サ", "タ", "イ", "キ", "シ", "チ", "ウ", "ク", "ス", "ツ", "が", "ぎ"]
let result4 = strings.sorted {
$0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}
print(result4) // ["あ", "ア", "い", "イ", "う", "ウ", "か", "カ", "が", "き", "キ", "ぎ", "く", "ク", "さ", "サ", "し", "シ", "す", "ス", "た", "タ", "ち", "チ", "つ", "ツ"]