NTFS file system file name is used in the code? Writing filename will Windows and tripartite Linux GBK and utf-8 coexist? This is not part of the code file list show?
CodePudding user response:
NT kernel and NTFS file system USES utf-8 character encoding 16 (NT with UCS - 2 codes), including the long file names of FAT32 file system also use the UTF - 16 coding (short file names but use ANSI code storage)
CodePudding user response:
reference 1/f, play big shoot early nuclear response: NT kernel and NTFS file system USES utf-8 character encoding 16 (NT with UCS - 2 codes), including the long file names of FAT32 file system also use the UTF - 16 coding (short file names but use ANSI code storage) Linux file system and corruption are encoded in utf-8 32? Under Linux access to NTFS partition is to use UTF - 16 or UTF - 32? CodePudding user response:
Linux/Unix kernel USES 4 bytes character encoding (but the API layer use utf-8), but is utf-8 32 or uncertain UCS - 4, only effective range of the difference between the two, UTF use 32-bit low 21-32, using 32-bit low 31 UCS - 4 Articulated file system, must be in accordance with the access of the file system to store coding, otherwise, reading and writing is wrong, are usually accessed through the NFS protocol of native support file system CodePudding user response:
Seems to Linux in the file system does not support a character encoding, name of the file/directory name directly stored as a sequence of bytes, depends on how to display the current configuration of the system, and the design is a bit scary CodePudding user response:
reference 3 floor early play play nuclear response: Linux/Unix kernel USES 4 bytes character encoding (but the API layer use utf-8), but is utf-8 32 or uncertain UCS - 4, only effective range of the difference between the two, UTF use 32-bit low 21-32, using 32-bit low 31 UCS - 4 Articulated file system, must be in accordance with the access of the file system to store coding, wrong, otherwise, speaking, reading and writing are usually accessed through the NFS protocol of native support file system Use 21 is 2 m ($2 million) coding location, 31 is 2 g (2 billion), why do so many, there are so many characters in the world? references 4 floor early play play nuclear response: level seems to Linux in the file system does not support the character encoding, name of the file/directory name stored as a byte sequence, directly depends on how to display the current configuration of the system, and the design is a bit scary Displayed is turn utf-8 or turn GBK byte series? What are the benefits and problems to do so? CodePudding user response:
reference 5 floor ooolinux reply: Quote: refer to the third floor have a big play nuclear war reply early: Linux/Unix kernel USES 4 bytes character encoding (but the API layer use utf-8), but is utf-8 32 or uncertain UCS - 4, only effective range of the difference between the two, UTF use 32-bit low 21-32, using 32-bit low 31 UCS - 4 Articulated file system, must be in accordance with the access of the file system to store coding, wrong, otherwise, speaking, reading and writing are usually accessed through the NFS protocol of native support file system Use 21 is 2 m ($2 million) coding location, 31 is 2 g (2 billion), why do so many, there are so many characters in the world? Certainly don't have that much, at present the latest unicode 12.1 defines only 137929 characters, but the unicode as a unified character set, the ambition is that includes all of human history to create words and symbols (such as oracle, cuneiform, Maya text, etc.), so the coding space must be large enough, 16 is not big enough, 24 again not convenient for computer access, so I can only use 32-bit CodePudding user response:
reference 5 floor ooolinux reply: Quote: refer to 4th floor play big shoot early nuclear response: level seems to Linux in the file system does not support the character encoding, name of the file/directory name stored as a byte sequence, directly depends on how to display the current configuration of the system, and the design is a bit scary Displayed is turn utf-8 or turn GBK byte series? What are the benefits and problems to do so? Is not good, after changing the locale, disk storage does not change the file name of coding, under the new locale show the old file name may be garbled, this should be left over from the early design problem, but if there would be no problem has been using unicode locale, the trouble is that if the articulated a Linux partition a hard disk, and you know it's original system locale, CodePudding user response:
refer to 6th floor early play big play nuclear response: Quote: refer to the 5 floor ooolinux response: Quote: refer to the third floor play big shoot early nuclear response: Linux/Unix kernel USES 4 bytes character encoding (but the API layer use utf-8), but is utf-8 32 or uncertain UCS - 4, only effective range of the difference between the two, UTF use 32-bit low 21-32, using 32-bit low 31 UCS - 4 Articulated file system, must be in accordance with the access of the file system to store coding, wrong, otherwise, speaking, reading and writing are usually accessed through the NFS protocol of native support file system Use 21 is 2 m ($2 million) coding location, 31 is 2 g (2 billion), why do so many, there are so many characters in the world? Certainly don't have that much, at present the latest unicode 12.1 defines only 137929 characters, but the unicode as a unified character set, the ambition is that includes all of human history to create words and symbols (such as oracle, cuneiform, Maya text, etc.), so the coding space must be large enough, 16 is not big enough, 24 again not convenient for computer access, so I can only use 32-bit Kangxi also surprisingly kangxi dictionary can move on the computer, CodePudding user response:
refer to 7th floor early play big play nuclear response: Quote: refer to the fifth floor ooolinux reply: Quote: refer to 4th floor play big shoot early nuclear response: level seems to Linux in the file system does not support the character encoding, name of the file/directory name stored as a byte sequence, directly depends on how to display the current configuration of the system, and the design is a bit scary Displayed is turn utf-8 or turn GBK byte series? What are the benefits and problems to do so? Is not good, after changing the locale, disk storage does not change the file name of coding, under the new locale show the old file name may be garbled, this should be left over from the early design problem, but if there would be no problem has been using unicode locale, the trouble is that if the articulated a Linux partition a hard disk, and you know it's original system locale, Such as a Linux partition hard drive, from China to the United States to hook up a Linux system, if the locale and the disk area of Linux system setup is different, the piece of hard disk file name may show the code? CodePudding user response:
Yes, Linux locale is equivalent to the Windows code page, localectl set - locale LANG=zh_CN, utf-8 is equivalent to the Windows CHCP 936, actually the original characters or multibyte character encoding CodePudding user response:
earlier reference to the tenth floor to play big play nuclear response: yes, Linux locale is equivalent to the Windows code page, localectl set - locale LANG=zh_CN, utf-8 is equal to the Windows of CHCP 936, in fact, the original characters or multibyte character encoding If China's hard disk Linux system is zh_CN utf-8, EN is the Linux system of the United States, is utf-8, the articulated hard disk file name in Chinese shows normal?