Home > OS >  Encoding UTF-16 to UTF-8 C#
Encoding UTF-16 to UTF-8 C#

Time:11-28

Hello everyone i have some problem with Encoding.. i want convert utf-16 to utf-8 i founded many code but didn't work.. I hope help me.. Thanks

This text =>

'\x04\x1a\x040\x04@\x04B\x040\x00 \x00*\x003\x003\x000\x001\x00:\x00 \x000\x001\x00.\x001\x001\x00.\x002\x000\x002\x002\x00 \x001\x004\x00:\x001\x000\x00,\x00 \x04?\x04>\x04?\x04>\x04;\x04=\x045\x04=\x048\x045\x00 \x003\x003\x00.\x003\x003\x00 \x00T\x00J\x00S\x00.\x00 \x00 \x04\x14\x04>\x04A\x04B\x04C\x04?\x04=\x04>\x00 \x003\x002\x002\x003'

#I tryed this

  string v = Regex.Unescape(text);

get result like

♦→♦0♦@♦B♦0 *3301: 01.11.2022 14:10, ♦?♦>♦?♦>♦;♦=♦5♦=♦8♦5 33.33 TJS. ♦¶♦>♦A♦B♦C♦?♦=♦> 3223

and continue

  public static string Utf16ToUtf8(string utf16String)
        {
            // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
            byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String);
            byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes);

            // Return UTF8 bytes as ANSI string
            return Encoding.Default.GetString(utf8Bytes);
        }

don't worked

I need result like this

Карта *4411: 01.11.2022 14:10, пополнение 33.33 TJS. Доступно 3223

CodePudding user response:

The code below decodes the text to what you want, but it would be much better to avoid getting into this situation in the first place. If the data is fundamentally text, store it as text in your log files without the extra "convert to UTF-16 then encode that binary data" aspect - that's just causing problems.

The code below "decodes" the text log data into a byte array by treating each \x escape sequence as a single byte (assuming \\ is used to encode backslashes) and treating any other character as a single byte - effectively ISO-8859-1.

It then converts the byte array to a string using big-endian UTF-16. The output is as desired:

Карта *3301: 01.11.2022 14:10, пополнение 33.33 TJS. Доступно 3223

The code is really inefficient - it's effectively a proof of concept to validate the text format you've got. Don't use it as-is; instead, use this as a starting point for improving your storage representation.

using System.Text;

class Program
{
    static void Main()
    {
        string logText = @"\x04\x1a\x040\x04@\x04B\x040\x00 \x00*\x003\x003\x000\x001\x00:\x00 \x000\x001\x00.\x001\x001\x00.\x002\x000\x002\x002\x00 \x001\x004\x00:\x001\x000\x00,\x00 \x04?\x04>\x04?\x04>\x04;\x04=\x045\x04=\x048\x045\x00 \x003\x003\x00.\x003\x003\x00 \x00T\x00J\x00S\x00.\x00 \x00 \x04\x14\x04>\x04A\x04B\x04C\x04?\x04=\x04>\x00 \x003\x002\x002\x003";

        byte[] utf16 = DecodeLogText(logText);
        string text = Encoding.BigEndianUnicode.GetString(utf16);
        Console.WriteLine(text);
    }

    static byte[] DecodeLogText(string logText)
    {
        List<byte> bytes = new List<byte>();
        for (int i = 0; i < logText.Length; i  )
        {
            if (logText[i] == '\\')
            {
                if (i == logText.Length - 1)
                {
                    throw new Exception("Trailing backslash");
                }
                switch (logText[i   1])
                {
                    case 'x':
                        if (i >= logText.Length - 3)
                        {
                            throw new Exception("Not enough data for \\x escape sequence");
                        }
                        // This is horribly inefficient, but never mind.
                        bytes.Add(Convert.ToByte(logText.Substring(i   2, 2), 16));
                        // Consume the x and hex
                        i  = 3;
                        break;
                    case '\\':
                        bytes.Add((byte) '\\');
                        // Consume the extra backslash
                        i  ;
                        break;
                    // TODO: Any other escape sequences?
                    default:
                        throw new Exception("Unknown escape sequence");
                }
            }
            else
            {
                bytes.Add((byte) logText[i]);
            }
        }
        return bytes.ToArray();
    }
}

CodePudding user response:

Also help me after code Jon Skeet my short code

string reg = Regex.Unescape(text2);

        byte[] ascii = Encoding.BigEndianUnicode.GetBytes(reg);
        byte[] utf8 = Encoding.Convert(Encoding.BigEndianUnicode, Encoding.UTF8, ascii);

        Console.WriteLine(Encoding.BigEndianUnicode.GetString(utf8));
  •  Tags:  
  • c#
  • Related