Home > Software engineering >  How to get unicode of characters from 55296 to 56319 in Excel
How to get unicode of characters from 55296 to 56319 in Excel

Time:07-02

I generated a list of letters in excel, from character codes 1 to 66535.

I am trying to get back the unicode by using the function "UNICODE". However, excel return #VALUE! for character codes from 55296 to 56319.

Please advise if there are any other function that can return a proper unicodes.

Thank you.

CodePudding user response:

The range you are listing is a special range in Unicode: surrogates.

So, they have Unicode code point, but the problem it is you cannot have them in a text: Windows uses UCS-2/UTF-16 as internal encoding, so there are no way you can put in text. Or better: you to have code points above 65535, Windows uses two surrogates, one in the range 0xD800-0xDBFF (high surrogate) and the second one 0xDC00-0xDFFF )low surrogate). By combining these two, you have all Unicode code points.

But so, you should never have a single surrogate (or a mismatch surrogate, e.g. a high surrogate not followed a low surrogate, or a low surrogate not preceded be a high surrogate).

So, just skip such codes. Or better use them correctly to have characters above 65535.

Note: you cannot have all Unicode characters only with one code point. many characters requires combining many code points (there is a whole category of "combining characters" in Unicode). E.g. the zero with a oblique line is rendered with two unicode characters: the normal zero, and a variant selector. Also accented characters are very limited (and often with just one accent per characters). And without going to more complex scripts.

  • Related