I have the following function:
private void ReceivedData(byte[] data)
{
string info = Encoding.ASCII.GetString(data);
When I use this, then the data, containing an é
character, replace that character by a question mark (?
).
For your information, the data
looks as follows in Visual Studio's Watch window (the mentioned character is found back in data[27]
and data[28]
):
For your information: when I type ALT 0233 on my computer, I see the mentioned é
character.
When I replace ASCII encoding by UTF8 encoding (as suggested on some websites or some answers here on the site), I get some weird characters, containing question marks (��
, or in an image ):
private void ReceivedData(byte[] data)
{
string info = Encoding.UTF8.GetString(data);
Which encoding should I use for correctly decode French characters?
Thanks in advance
CodePudding user response:
Looks like a Win-1252 encoding (which is for various Latin characters with diacritics),
// In case you work with .Net Core you have to enable code pages (1252)
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
byte[] data = {
95, 233, 233, 110
};
var result = Encoding.GetEncoding(1252).GetString(data);
Console.Write(result);
Output:
_één
Edit: In general case, when facing unknown encoding you can try quering all the encodings available and inspect the results:
using System.Linq;
using System.Text;
...
// Enable code pages for .net core
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
byte[] data = {
95, 233, 233, 110
};
var report = string.Join(Environment.NewLine, Encoding
.GetEncodings()
.OrderBy(encoder => encoder.Name, StringComparer.OrdinalIgnoreCase)
.Select(encoder => (name: encoder.Name, text: encoder.GetEncoding().GetString(data)))
.Where(pair => pair.text.Contains('é')) // at least one é must be present
.Select(pair => $"{pair.name,-30} : {pair.text}"));
Console.Write(report);
Output:
iso-8859-1 : _één
iso-8859-13 : _één
iso-8859-15 : _één
iso-8859-2 : _één
iso-8859-3 : _één
iso-8859-4 : _één
iso-8859-9 : _één
windows-1250 : _één
windows-1252 : _één <- The most probabale (IMHO) encoding
windows-1254 : _één
windows-1256 : _één
windows-1257 : _één
windows-1258 : _één
CodePudding user response:
Encoding.Latin1.GetString(data);