I need to create System.String
from file with some unknown ASCII-compatible 1-byte encoding to replace some numbers in text with regex, but Encoding.ASCII
is 7-bit, and Utf-8 is multi-byte so it won't round-trip back to same byte sequence.
Is there encoding in .Net Core which can round-trip any byte sequence?
UPD: Windows-1256 Character set looks promising, but it Windows only.
CodePudding user response:
Firstly, if you don't know the encoding, using the string is more a hack than a solution. I guess it can be quicker and possibly easier, but for more robust and clean solution, I'd just work on byte[] or something similar. String would be fine, if it's use-once solution.
If you really want to use strings here, I'd suggest looking into all encodings from windows/cp family or ISO 8859 family (EDIT: no, ISO won't work). E.g. on my PC, I can use iso-8859-1
. Windows 1256
would also be fine, on Linux is often called cp 1256
(CP for Code Page).
However, you are not guaranteed to have that encoding supported on all platforms, AFAIK .NET guarantees only UTFs and Unicode. If you want to have this encoding available everywhere, consider using nuget package like this: https://www.nuget.org/packages/System.Text.Encoding.CodePages/
CodePudding user response:
Using ISO-8859-1
will map directly to Latin-1 Supplement Unicode block and back again. And it one of encodings .NET Core supports the by default.
var enc = Encoding.GetEncoding(28591); // ISO-8859-1 (code page 28591)
var b = Enumerable.Range(0, 0xFF 1).Select(x => (byte)x).ToArray();
enc.GetBytes(enc.GetString(b)).SequenceEqual(b) == true