Home > Enterprise >  Katakana character ジ in URL being encoded incorrectly
Katakana character ジ in URL being encoded incorrectly

Time:03-31

I need to construct a URL with a string path received from my application server which contains the character: ジ

However, in Swift, the fileURLWithPath seems to encode it incorrectly.

let path = "ジ"
print(URL(fileURLWithPath: path))
print(URL(fileURLWithPath: path.precomposedStringWithCanonicalMapping))

Both print:

ジ

This URL path is rejected by the server, as the expected URL should be:

What am I missing or doing wrong? Any help is appreciated.

CodePudding user response:

There are two different characters, and ジ. They may look the same, but they have different internal representations.

  • The former is “katakana letter zi”, comprised of a single Unicode scalar which percent-encodes as .

  • The latter is still a single Swift character, but is comprised of two Unicode scalars (the “katakana letter si” and “combining voiced sound mark”), and these two Unicode scalars percent-encode to ジ.

One can normalize characters in a string with precomposedStringWithCanonicalMapping, for example. That can convert a character with the two Unicode scalars into a character with a single Unicode scalar.

But your local file system (or, init(fileURLWithPath:), at least) decomposes diacritics. It is logical that the local file system ensures that diacritics are encoded in some consistent manner. (See Diacritics in file names on macOS behave strangely.) The fact that they are decomposed rather than precomposed is, for the sake of this discussion, a bit academic. When you send it to the server, you want it precomposed, regardless of what is happening in your local file system.

Now, you tell us that the “url path is rejected by the server”. That does not make sense. One would generally not provide a local file system URL to a remote server. One would generally extract a file name from a local file system URL and send that to the server. This might be done in a variety of ways:

  • You can use precomposedStringWithCanonicalMapping when adding a filename to a server URL, and it honors that mapping, unlike a file URL:

    let path = "ジ"      // actually `ジ` variant
    let url = URL(string: "https://example.com")!
        .appendingPathComponent(path.precomposedStringWithCanonicalMapping)
    print(url)          // https://example.com/ジ
    
  • If sending it in the body of a request, use precomposedStringWithCanonicalMapping. E.g. if a filename in a multipart/form-data request:

    body.append("--\(boundary)\r\n")
    body.append("Content-Disposition: form-data; name=\"\(filePathKey)\"; filename=\"\(filename.precomposedStringWithCanonicalMapping)\"\r\n")
    body.append("Content-Type: \(mimeType)\r\n\r\n")
    body.append(data)
    body.append("\r\n")
    

Now, those are two random examples of how a filename might be provided to the server. Yours may vary. But the idea is that when you provide the filename, that you precompose the string in its canonical format, rather than relying upon what a file URL in your local file system uses.

CodePudding user response:

you could try this approach using dataRepresentation:

if let path = "ジ".data(using: .utf8),
   let url = URL(dataRepresentation: path, relativeTo: nil) {
    print("\n---> url: \(url) \n") //---> url: ジ
}
  • Related