I'm attempting to read a .txt file and place the text into an object. Then later on serialize said object and write to another .txt, all while keeping the exact same characters.
I've tried using 'iso-8859-1' encoding when using File.ReadAllLines()
but I get following:
Result
I've also tried creating a custom JavascriptEncoder for serialization but that did not work, I'm assuming since the read wasn't even getting the correct characters.
Is there a way I can write a custom encoder for both File.ReadAllLines()
and JsonSerializer.Serialize()
so that I can keep the exact same characters throughout. Thanks
Edit : I removed the encoding entirely and it worked for most characters, but still returns 'œ' as 'o'. Original Text: sfør Är du säker på a un¹æ ko róciæ kolejnoœæ numeró e¿y pamiêtaæ, ¿e w aŸn nieœ w górê g³ówna w³aœc
CodePudding user response:
Ultimately, if you're going to read and write text: you need to know what encoding you're meant to be using. You cannot usually guess. There's not really any such thing as a "text file"; there's just a binary file that your code is going to translate to text via an encoding; either the system can guess, or you can tell it. These days, UTF8 is a pragmatic default, and ANSI encodings such as iso-8859-1 should usually be considered legacy and reserved for handling data that is limited to that specific codepage for historic reasons
So, either:
- determine what encoding you're meant to be using and use that for both read and write, or
- treat the data as raw bytes, without attempting to parse it into string (etc) data