My C# code doesn't read special characters from file-CodePudding

as the title suggest, I have a problem with my c# code not reading files correctly, because when I try to read characters from file such as: č, ć, š, đ, ž, etc., I get �. I need my program to be able to read all characters even from other languages. I also tried using Encoding parameter with UTF-8 and Default but that also didn't work. Below is an example of code.

string[] lines = File.ReadAllLines(filePath, Encoding.UTF8);

CodePudding user response：

The

č, ć, š, đ, ž

suggests here that this could be one of ANSI code pages of Eastern Europe. A recommendation is then to try

CodePagesEncodingProvider.Instance.GetEncoding(1250)

as the encoding.

Sadly, there's no easy way to guess a code page of a 8-bit file. To overcome such issues, UTF-8 (and other unicode encodings) were designed. Thus, if there's a control on how source files are created, please strongly recommend to have UTF8 (or Unicode but there's no need) files.

CodePudding user response：

try this

stringbuilder sb = new stringbuilder();

using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName,
                                       Encoding.GetEncoding("iso-8859-1")))
{
    using (System.IO.StreamWriter writer = new System.IO.StreamWriter(
                                           outFileName, Encoding.UTF8))
    {
        sb.AppendLine(reader.ReadToEnd());
    }
}