as the title suggest, I have a problem with my c# code not reading files correctly, because when I try to read characters from file such as: č, ć, š, đ, ž, etc., I get �. I need my program to be able to read all characters even from other languages. I also tried using Encoding parameter with UTF-8 and Default but that also didn't work. Below is an example of code.
string[] lines = File.ReadAllLines(filePath, Encoding.UTF8);
CodePudding user response:
The
č, ć, š, đ, ž
suggests here that this could be one of ANSI code pages of Eastern Europe. A recommendation is then to try
CodePagesEncodingProvider.Instance.GetEncoding(1250)
as the encoding.
Sadly, there's no easy way to guess a code page of a 8-bit file. To overcome such issues, UTF-8 (and other unicode encodings) were designed. Thus, if there's a control on how source files are created, please strongly recommend to have UTF8 (or Unicode but there's no need) files.
CodePudding user response:
try this
stringbuilder sb = new stringbuilder();
using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName,
Encoding.GetEncoding("iso-8859-1")))
{
using (System.IO.StreamWriter writer = new System.IO.StreamWriter(
outFileName, Encoding.UTF8))
{
sb.AppendLine(reader.ReadToEnd());
}
}