Home > Back-end >  A Chinese characters of 2 bytes, up to 65535, but has more than 90000 Chinese characters, how to do?
A Chinese characters of 2 bytes, up to 65535, but has more than 90000 Chinese characters, how to do?

Time:02-23

Recently began to learn assembly, know 2 bytes can produce 65535 kind of change, although commonly used thousands of Chinese characters, but all have more than 90000 Chinese characters,

CodePudding user response:

So there is multi-byte character and the character set and encoding (how many characters, character set rules, included coding rules of each character encoding), such as GBK, BIG5, UTF8, etc.

CodePudding user response:

Which character set 1, you can't expect to receive complete all of the word, even if receive complete, there will be new word! In fact, unlike English, the generation of new words and expressions, often there will be a new character, in fact, early have so-called word program (of course for word processing, the main use are used to exchange the few), but now the information exchange become the main purpose, so can't make a word, must make the standard, the authority of the department to do such a thing!
2, support Unicode character limit is 65536 or 2 bytes, this argument is problematic, in fact there are three or four bytes of version, so 9 Wan Benshen is not a problem

CodePudding user response:

The latest unicode 13.0 have included 143859 characters, unicode ambition is covers all the text, and symbols of human creation, including oracle, jinwen, cuneiform, Maya text, etc., I estimate and even primitive people created paintings in ~

CodePudding user response:

reference 3 floor early play play nuclear response:
the latest unicode 13.0 have included 143859 characters, unicode ambition is from human creation, all the words and symbols, including oracle, jinwen, cuneiform, Maya text, etc., I estimate and even primitive people created paintings in ~
NB!

CodePudding user response:

reference 3 floor early play play nuclear response:
the latest unicode 13.0 have included 143859 characters, unicode ambition is from human creation, all the words and symbols, including oracle, jinwen, cuneiform, Maya text, etc., I estimate and even primitive people created paintings in ~

Checked the, it already has more than 93000 Chinese characters! The higher knowledge

CodePudding user response:

reference 3 floor early play play nuclear response:
the latest unicode 13.0 have included 143859 characters, unicode ambition is from human creation, all the words and symbols, including oracle, jinwen, cuneiform, Maya text, etc., I estimate and even primitive people created paintings in ~

143859 characters is not ah, simplified and traditional may have not enough, other language text plus all absolutely inadequate,

CodePudding user response:

refer to 6th floor qq_16774199 response:
Quote: refer to the third floor have a big play nuclear war reply early:
the latest unicode 13.0 have included 143859 characters, unicode ambition is covers all the text, and symbols of human creation, including oracle, jinwen, cuneiform, Maya text, etc., I estimate and even primitive people created paintings in ~

143859 characters is not ah, simplified and traditional may have not enough, all other language text and absolutely inadequate,


This is now included the number of characters, coding space enough to use, 16 utf-8 support more than one hundred coding (0 ~ 10 FFFF), and using 31 UCS - 4 coding (highest fixed 0), can support more than 2.1 billion characters, use human destruction has filled

CodePudding user response:

Windows fonts are still stay in unicode (3) the level of x is about more than 38000 characters, according to Microsoft, they increase the cost of a word is more than $200, so no interest to

CodePudding user response:

Fyi:
 # pragma comment (lib, "user32") 
# pragma comment (lib, "gdi32")
# include & lt; Conio. H>
# include & lt; Stdio. H>
# include & lt; Stdlib. H>
# include & lt; Windows. H>
Extern "C" HWND WINAPI GetConsoleWindow ();
Void HideTheCursor () {
CONSOLE_CURSOR_INFO cciCursor;
HANDLE hStdOut=GetStdHandle (STD_OUTPUT_HANDLE);

If (GetConsoleCursorInfo (hStdOut, & amp; CciCursor)) {
CciCursor. BVisible=FALSE;
SetConsoleCursorInfo (hStdOut, & amp; CciCursor);
}
}
Void ShowTheCursor () {
CONSOLE_CURSOR_INFO cciCursor;
HANDLE hStdOut=GetStdHandle (STD_OUTPUT_HANDLE);

If (GetConsoleCursorInfo (hStdOut, & amp; CciCursor)) {
CciCursor. BVisible=TRUE;
SetConsoleCursorInfo (hStdOut, & amp; CciCursor);
}
}
Int main () {
The HWND HWND;
HDC HDC.
HFONT HFONT;
Would the wc [2];

System (" color F0 ");
System (" CLS ");
HideTheCursor ();
HWND=GetConsoleWindow ();
HDC=GetDC (HWND);
Hfont=CreateFont (48,0,0,0,0,0,0,0, GB2312_CHARSET, 0,0,0,0, "song typeface - founder large character set");
SelectObject (HDC, hfont);
Wc [0]=0 xd854u;
Wc [1]=0 xdc00u;
TextOutW (HDC, 10, 10, wc, 2);
DeleteObject (hfont);
ReleaseDC (HWND, HDC);
getch();
07 system (" color ");
System (" CLS ");
ShowTheCursor ();
return 0;
}
# if 0
Paragraphs agent or broker for a pair of common represent the 16-bit Unicode value of a single character, to remember the key point is:
Agent for is actually a 32-bit single characters, can no longer assume a 16-bit Unicode value is mapped to a character,

Use a proxy item for
Agent of the first value is high, contain between U + D800 to U + DBFF 16-bit code values, within the scope of the
The item to the second value is low agent, containing between U + DC00 to U + DFFF range value, through the use of the agent on,
16-bit Unicode system can already defined by the Unicode standard of more than one hundred other characters to addressing (220),

In the passed to XmlTextWriter method agent can be used in any string of characters, however, agent of the
characters in writingShould be effective in the XML, for example, the world wide web consortium (W3C) recommendations are not allowed to use in the name of the element or attribute agent character,
If the string contains invalid agent on, will cause abnormal,

In addition, you can use WriteSurrogateCharEntity write with the agent to the corresponding character entities, character entities to 16
Generate hexadecimal format to write, using the following formula:

(highChar xd800 0) * 0 x400 + (lowChar - 0 xdc00) + 0 x10000

If the string contains invalid agent on, an exception is thrown, the following example shows the proxy methods to WriteSurrogateCharEntity as input,

C # copy
//The following line writes & amp; # x10000.
WriteSurrogateCharEntity (' \ uDC00 ', '\ uD800');
The following sample generates an agent to file, loads it into the XmlReader, with the new filename to save the file,
Then, the original file and the new file is loaded back to the application of the XML document object model (DOM) structure in order to compare,

C # copy
Char lowChar highChar;
Char [] charArray=new char [10].
FileStream targetFile=new FileStream (" SurrogatePair. XML, "
FileMode. Create, FileAccess ReadWrite, FileShare. ReadWrite);

LowChar=the Convert. ToChar (0 xdc00);
HighChar=the Convert. ToChar (0 xd800);
XmlTextWriter tw=new XmlTextWriter (targetFile, null);
Tw. Formatting=Formatting. Indented;
Tw. WriteStartElement (" root ");
Tw. WriteStartAttribute (" test ", null);
Tw. WriteSurrogateCharEntity (lowChar highChar);
LowChar=the Convert. ToChar (0 xdc01);
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related