Hello everyone i have some problem with Encoding.. i want convert utf-16 to utf-8 i founded many code but didn't work.. I hope help me.. Thanks
This text =>
'\x04\x1a\x040\x04@\x04B\x040\x00 \x00*\x003\x003\x000\x001\x00:\x00 \x000\x001\x00.\x001\x001\x00.\x002\x000\x002\x002\x00 \x001\x004\x00:\x001\x000\x00,\x00 \x04?\x04>\x04?\x04>\x04;\x04=\x045\x04=\x048\x045\x00 \x003\x003\x00.\x003\x003\x00 \x00T\x00J\x00S\x00.\x00 \x00 \x04\x14\x04>\x04A\x04B\x04C\x04?\x04=\x04>\x00 \x003\x002\x002\x003'
#I tryed this
string v = Regex.Unescape(text);
get result like
♦→♦0♦@♦B♦0 *3301: 01.11.2022 14:10, ♦?♦>♦?♦>♦;♦=♦5♦=♦8♦5 33.33 TJS. ♦¶♦>♦A♦B♦C♦?♦=♦> 3223
and continue
public static string Utf16ToUtf8(string utf16String)
{
// Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);
// Return UTF8 bytes as ANSI string
return Encoding.Default.GetString(utf8Bytes);
}
don't worked
I need result like this
Карта *4411: 01.11.2022 14:10, пополнение 33.33 TJS. Доступно 3223
CodePudding user response:
The code below decodes the text to what you want, but it would be much better to avoid getting into this situation in the first place. If the data is fundamentally text, store it as text in your log files without the extra "convert to UTF-16 then encode that binary data" aspect - that's just causing problems.
The code below "decodes" the text log data into a byte array by treating each \x
escape sequence as a single byte (assuming \\
is used to encode backslashes) and treating any other character as a single byte - effectively ISO-8859-1.
It then converts the byte array to a string using big-endian UTF-16. The output is as desired:
Карта *3301: 01.11.2022 14:10, пополнение 33.33 TJS. Доступно 3223
The code is really inefficient - it's effectively a proof of concept to validate the text format you've got. Don't use it as-is; instead, use this as a starting point for improving your storage representation.
using System.Text;
class Program
{
static void Main()
{
string logText = @"\x04\x1a\x040\x04@\x04B\x040\x00 \x00*\x003\x003\x000\x001\x00:\x00 \x000\x001\x00.\x001\x001\x00.\x002\x000\x002\x002\x00 \x001\x004\x00:\x001\x000\x00,\x00 \x04?\x04>\x04?\x04>\x04;\x04=\x045\x04=\x048\x045\x00 \x003\x003\x00.\x003\x003\x00 \x00T\x00J\x00S\x00.\x00 \x00 \x04\x14\x04>\x04A\x04B\x04C\x04?\x04=\x04>\x00 \x003\x002\x002\x003";
byte[] utf16 = DecodeLogText(logText);
string text = Encoding.BigEndianUnicode.GetString(utf16);
Console.WriteLine(text);
}
static byte[] DecodeLogText(string logText)
{
List<byte> bytes = new List<byte>();
for (int i = 0; i < logText.Length; i )
{
if (logText[i] == '\\')
{
if (i == logText.Length - 1)
{
throw new Exception("Trailing backslash");
}
switch (logText[i 1])
{
case 'x':
if (i >= logText.Length - 3)
{
throw new Exception("Not enough data for \\x escape sequence");
}
// This is horribly inefficient, but never mind.
bytes.Add(Convert.ToByte(logText.Substring(i 2, 2), 16));
// Consume the x and hex
i = 3;
break;
case '\\':
bytes.Add((byte) '\\');
// Consume the extra backslash
i ;
break;
// TODO: Any other escape sequences?
default:
throw new Exception("Unknown escape sequence");
}
}
else
{
bytes.Add((byte) logText[i]);
}
}
return bytes.ToArray();
}
}
CodePudding user response:
Also help me after code Jon Skeet my short code
string reg = Regex.Unescape(text2);
byte[] ascii = Encoding.BigEndianUnicode.GetBytes(reg);
byte[] utf8 = Encoding.Convert(Encoding.BigEndianUnicode, Encoding.UTF8, ascii);
Console.WriteLine(Encoding.BigEndianUnicode.GetString(utf8));