Home > Net >  Why does File.ReadAllText() also recognize UTF-16 encodings?
Why does File.ReadAllText() also recognize UTF-16 encodings?

Time:12-24

I read a file using

File.ReadAllText(..., Encoding.ASCII);

According the documentation [MSDN] (emphasis mine),

This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.

However, in my case the ASCII file incorrectly started with 0xFE 0xFF and it detected UTF-16 (probably big endian, but I did not check).

CodePudding user response:

According to File [referencesource] it uses a StreamReader:

private static String InternalReadAllText(String path, Encoding encoding, bool checkHost)
{
  ...
  using (StreamReader sr = new StreamReader(path, encoding, true, StreamReader.DefaultBufferSize, checkHost))
    return sr.ReadToEnd();
}

and that StreamReader overload with 5 parameter [MSDN] is documented to support UTF-16 as well

It automatically recognizes UTF-8, little-endian Unicode, big-endian Unicode, little-endian UTF-32, and big-endian UTF-32 text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.

(emphasis mine)

Since File.ReadAlltext() is supposed to and documented to detect Unicode BOMs, it's probably a good idea that it detects UTF-16 as well. However, the documentation is wrong and should be updated. I filed issue #7515.

  • Related