I'm reading both Hungarian and Serbian words from a text document (which is tab delimited, exported from excel), then I'm writing them on the console. When I write it on the screen, it can't display characters that are outside the English ABC.
For example, instead of körte
I get kĂśrte
, and instead of kruška
I get kruĹĄka
.
I'm using streamreader (and later streamwriter), and I've set the encoding to iso-8859-2
for both of them, as well as for the output. This encoding includes both sets of characters I need.
Console.OutputEncoding = Encoding.GetEncoding("iso-8859-2");
using(StreamReader sr = new StreamReader(fIN, Encoding.GetEncoding("iso-8859-2"))) {
using(StreamWriter sw = new StreamWriter(fDB, Encoding.GetEncoding("iso-8859-2"))) {
I've tried to see whether it had trouble writing it on the console, so I just tried writing all these characters on the screen, and it displays everything with no problem.
Console.WriteLine("á Á é É í Í ó Ó ö Ö ü Ü ű Ű");
Console.WriteLine("č Č ć Ć đ Đ š Š ž Ž");
//outputs properly
I tried to see whether it had trouble storing these characters, so I've put them in a string and tried to display it, with no problems.
string s13 = "á Á é É í Í ó Ó ö Ö ü Ü ű Ű";
Console.WriteLine(s13);
s13 = "č Č ć Ć đ Đ š Š ž Ž ";
Console.WriteLine(s13);
//outputs properly
I tried to see where the problem is in runtime with debugging, and it seems like when I read the data from file, it is read wrong.
try {
using(FileStream fs = new FileStream("DB.txt", FileMode.OpenOrCreate)) {
using(StreamReader sr = new StreamReader(fs, Encoding.GetEncoding("iso-8859-2"))) {
while(!sr.EndOfStream) {
string[] s = sr.ReadLine().Split('\t'); //immeadiately becomes faulty, even if not split
HuSrb word = new HuSrb(s[0], s[1]);
bool found = false;
foreach(Categories c in categories) {
if(c.Name == s[2]) {
c.Amount ;
c.Words.Add(word);
found = true;
break;
}
}
if(!found) {
Categories category = new Categories(s[2], word);
categories.Add(category);
}
}
}
}
}
catch(Exception) {
throw;
}
The funny thing is, later I read into a string from file A and write it into a string, then write the contents of that string into file B. Both file A and file B have the characters right, but in the middle, the string doesn't have the characters right.
So basically,
- The problem is not with storing the data
- The problem is not with printing the data
- The problem is not with writing the data into a file.
My assumption is that the problem is when reading from the file, but then I don't understand how it ends up being correct in the other file. Any help?
CodePudding user response:
The problem is that you probably used the wrong encoding while saving the input text file.
I tried to read and write your example's content using another encoding and it works. The thing is that I saved the input file in UTF8 and read the content using Encoding.UTF8
: