Home > Mobile >  Neither ASCII or UTF8 can encode French characters, what should I do?
Neither ASCII or UTF8 can encode French characters, what should I do?

Time:12-08

I have the following function:

private void ReceivedData(byte[] data)
{
    string info = Encoding.ASCII.GetString(data);

When I use this, then the data, containing an é character, replace that character by a question mark (?).

For your information, the data looks as follows in Visual Studio's Watch window (the mentioned character is found back in data[27] and data[28]):

enter image description here

For your information: when I type ALT 0233 on my computer, I see the mentioned é character.

When I replace ASCII encoding by UTF8 encoding (as suggested on some websites or some answers here on the site), I get some weird characters, containing question marks (��, or in an image enter image description here):

private void ReceivedData(byte[] data)
{
    string info = Encoding.UTF8.GetString(data);

Which encoding should I use for correctly decode French characters?

Thanks in advance

CodePudding user response:

Looks like a Win-1252 encoding (which is for various Latin characters with diacritics),

// In case you work with .Net Core you have to enable code pages (1252)
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

byte[] data = {
  95, 233, 233, 110
};

var result = Encoding.GetEncoding(1252).GetString(data);

Console.Write(result);

Output:

_één

Edit: In general case, when facing unknown encoding you can try quering all the encodings available and inspect the results:

using System.Linq;
using System.Text;

...

// Enable code pages for .net core
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

byte[] data = {
  95, 233, 233, 110
};

var report = string.Join(Environment.NewLine, Encoding
  .GetEncodings()
  .OrderBy(encoder => encoder.Name, StringComparer.OrdinalIgnoreCase)
  .Select(encoder => (name: encoder.Name, text: encoder.GetEncoding().GetString(data)))
  .Where(pair => pair.text.Contains('é')) // at least one é must be present
  .Select(pair => $"{pair.name,-30} : {pair.text}"));

Console.Write(report);

Output:

iso-8859-1                     : _één
iso-8859-13                    : _één
iso-8859-15                    : _één
iso-8859-2                     : _één
iso-8859-3                     : _één
iso-8859-4                     : _één
iso-8859-9                     : _één
windows-1250                   : _één
windows-1252                   : _één <- The most probabale (IMHO) encoding
windows-1254                   : _één
windows-1256                   : _één
windows-1257                   : _één
windows-1258                   : _één

CodePudding user response:

Encoding.Latin1.GetString(data);
  • Related