Home > Software design >  C# avoid bad encoding using HttpRequestMessage
C# avoid bad encoding using HttpRequestMessage

Time:12-24

I'm getting a response from an API call and I'm using the HttpRequestMessage to set up my "get" request and then HttpClient to stream the response and return a string. However, within the response, I'm getting \u2019 instead of ' and in when I convert this result into excel (using JsonConvert and CsvWriter), I'm getting ’ instead of ' in my csv. Do I miss something at the headers level when requesting the API's response ?

public static string GetResponse_CFRA(string oauth2_token, string apiKey, string uri)
        {
            var httpRequestMessage = new HttpRequestMessage
            {
                Method = HttpMethod.Get,
                RequestUri = new Uri(uri),
                Headers = {
                    { "Authorization", $"Bearer {oauth2_token}"},
                    { "x-api-key", apiKey}
                }
            };

            // Get the response from the API
            using (var client = new HttpClient())
            {
                try
                {
                    var response = client.SendAsync(httpRequestMessage).Result;
                    HttpContent responseContent = response.Content;
                    var responsedata = responseContent.ReadAsStringAsync();
                    string data = responsedata.Result;
                    return data;
                }

                catch
                {
                    string sorry = "Please call the admin";
                    return sorry;
                }
            }
        }

CodePudding user response:

\u2019 (the Unicode character U 2019) is the right single quotation mark , i.e. the slightly curved version of '.

When encoded as UTF-8 (which is the encoding which .NET uses by default to write files) it's represented by the byte sequence 0xE2 0x80 0x99.

However, if you take the bytes 0xE2 0x80 0x99 and interpret them not as UTF-8 but rather using the Windows 1252 code page (which is one of the default single-byte code pages on Windows, depending on your locale), 0xE2 maps to â, 0x80 to and 0x99 to .

So your problem is that you've got a text file which uses UTF-8 to encode characters to bytes, but Excel is trying to read it using Windows-1252 instead, which maps different characters onto those bytes.

Tell Excel to interpret the CSV file as UTF-8, add a UTF-8 byte order mark (BOM, which is the sequence 0xEF 0xBB 0xBF) to the start of the file, or change the encoding from UTF-8 to Windows-1252 when you save the file.

To write the string to a file using UTF-8 with BOM you'll need to manually specify the encoding as Encoding.UTF8, e.g.:

File.WriteAllText(path, contents, Encoding.UTF8);
  • Related