Home > Net >  Difference of String(contentsOf: URL).data(using: .utf8) vs. Data(contentsOf: URL)
Difference of String(contentsOf: URL).data(using: .utf8) vs. Data(contentsOf: URL)

Time:07-19

I have been playing with a json file in a playground and I've seen examples of reading the file like this:

do {
    let jsonData = try String(contentsOf: url).data(using: .utf8)
} catch {
    ...
}

And like this:

do {
    let jsonData = try Data(contentsOf: url)
} catch {
    ...
}

Is there a difference in the data? The only difference I see is the String data method is being formatted as UTF8 when read, where I am assuming the Data method is reading with a default format (UTF8 also??)? I can't see a difference in the data, however, but just want to make sure.

CodePudding user response:

The difference is that String(contentsOf: url) tries to read text from that URL, whereas Data(contentsOf: url) reads the raw bytes.

Therefore, if the file at the URL is not a plain text file, String(contentsOf: url) could throw an error, whereas Data(contentsOf: url) would read it successfully.

Regarding the encoding, String(contentsOf: url) is undocumented, but from its implementation, we can see that it calls NSString.init(contentsOf:usedEncoding:):

public init(
    contentsOf url: __shared URL
    ) throws {
    let ns = try NSString(contentsOf: url, usedEncoding: nil)
    self = String._unconditionallyBridgeFromObjectiveC(ns)
}

NSString.init(contentsOf:usedEncoding:) is documented:

Returns an NSString object initialized by reading data from a given URL and returns by reference the encoding used to interpret the data.

So apparently the encoding is guessed (?) and returned by reference, which is then ignored by String.init(contentsOf:), as it passed nil for the usedEncoding parameter.

This means that for some non-UTF-8 files, there is a chance of String(contentsOf:) guessing the correct encoding, and then data(using: .utf8) encodes the string to UTF-8 bytes, making the rest of your code work. If you had used Data(contentsOf:), you would be reading in the wrong encoding, and though it wouldn't throw an error, the JSON-parsing code later down the line probably would.

That said, JSON is supposed to be exchanged in UTF-8 (See RFC), so an error when you read a non-UTF-8 file is probably desired.

So basically, if we are choosing between these two options, just use Data(contentsOf:). It's simpler and less typing. You don't need to worry about thing like wrong encodings, or that the file is not plain text. If anything like that happens, it is not JSON, and the JSONDecoder later down the line would throw.

  • Related