Arabic Swift String isn't empty after removing all characters-CodePudding

I have this code to replace part of a string and remove white spaces:

let str = "باب ‏".replacingOccurrences(of: "باب", with: "").trimmingCharacters(in: .whitespacesAndNewlines)

print(str.count) /// gives 1 why not 0

But it gives me 1 always while it should be 0. Why?

CodePudding user response：

If it's RTL Mark (and it probably is), that's "\u{200F}" in Swift. If you want to trim it with the whitespace, you'd just add it to your set. That'd be something like:

.trimmingCharacters(in: whitespacesAndNewlines
                            .union(CharacterSet(charactersIn: "\u{200f}")))

You can also just replace that directly:

.replacingOccurrences(of: "\u{200f}باب", with: "")

Keep in mind the layout rules, since sometimes bidirectional literal strings like this can get confusing in the editor. You may want to separate the Arabic like:

let bab = "باب"
let rtl = "\u{200f}"

string.replacingOccurrences(of: rtl   bab, with: "")

CodePudding user response：

Let's look at the content of your original string:

func hexCharactersArray(_ string: String) -> String {
    string.unicodeScalars.map { String(format: "0x%X", $0.value)}.joined(separator: ",")
}

let originalString = "باب ‏"
print(hexCharactersArray(originalString))

The result is [0x628,0x627,0x628,0x20,0x200F]

0x628 - arabic letter beh
0x627 - arabic letter alef
0x628 - arabic letter beh
0x20 - space
0x200F - right-to-left mark

The first three are letters, then some whitespace, but 0x200F is a unicode character in the category of control characters. It's not a letter and it's not whitespace.

When you do:

let replacedString = originalString.replacingOccurrences(of: "باب", with: "").trimmingCharacters(in: .whitespacesAndNewlines)
print(hexCharactersArray(replacedString))

you get [0x200F]

Because you've replaced the letters and trimmed out the whitespace, but you've left behind a control character.

If you want to trim that out too, use:

let replacedString = originalString.replacingOccurrences(of: "باب", with: "").trimmingCharacters(in: .whitespacesAndNewlines.union(.controlCharacters))