Home > Software design >  Why does UTF-8 behave differently in StreamWriter
Why does UTF-8 behave differently in StreamWriter

Time:08-16

There was a problem with encodings when writing to a Russian language file in C#. What is the point: I have a string with Russian and English characters encoded in UTF8, I write it to a file in two ways:

using (StreamWriter sw = new StreamWriter(path, false, Encoding.UTF8))
{
    await sw.WriteLineAsync(stringContent);
}

in this case, everything is fine in the file, there are both Russian and English characters, notepad defines the encoding as UTF 8 with BOM.

using (StreamWriter sw = File.CreateText(path))
{
    await sw.WriteLineAsync(stringContent);
}

In this case, the file has English characters and porridge instead of Russian ones, notepad defines the encoding as UTF8 without BOM.

File.CreateText() return StreamWriter with Encoding.UTF8. Question: why if I explicitly specify UTF8 encoding everything works, but if it is used by default in File.CreateText(), then Russian characters turn into a mess? Problem with BOM symbols?

CodePudding user response:

Can this link help you? Why StreamWriter writes text to file with using UTF-8 without BOM?

It seems like it is the default behavior of StreamWriter.

StreamWriter defaults to using an instance of UTF8Encoding unless specified otherwise. This instance of UTF8Encoding is constructed without a byte order mark (BOM), so its GetPreamble method returns an empty byte array.

  • Related