Should I delete blank values in utf-16 encoding?-CodePudding

When I read all the bytes from a string using Encoding.Unicode, It gives me blank (0) values.

When I run this code:

byte[] value = Encoding.Unicode.GetBytes("Hi");

It gives me the output

I know this is because UTF-16 stores 2 bytes and the 0 is just the second byte, but my question is should i delete the 0's? since as far as I know, they do not do anything and my program requires to loop through the array so the 0's would only make it slower.

CodePudding user response：

No, you must not delete bytes from a text encoding, because then you end up with garbage that can no longer be considered a valid encoding of the text.

If you have many ASCII characters and a few non-ASCII characters, you are probably better off with the UTF-8 encoding instead of UTF-16.

UTF-8 encodes to a single byte for ASCII chars and uses 2-4 bytes for non-ASCII chars.

Here's an illustrative example:

var text = "ö";
Console.WriteLine(string.Join(",", Encoding.Unicode.GetBytes(text))); // 246,0
Console.WriteLine(string.Join(",", Encoding.UTF8.GetBytes(text))); // 195,182

Identical text/character/letter, different encoding