How is JavaScript string a set of elements of integer values?-CodePudding

From the MDN

JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it. You can create strings using string literals or string objects.

What does it mean when you say the JavaScript String type is a set of "elements" of 16-bit unsigned integer values? Why integer values?

CodePudding user response：

The 16-bit unsigned integer values is a representation of specific characters and since it is a set of elements, you are able to grab specific characters within a string with [] notation as you would a list. Ex:

const string = 'john doe';
console.log(string[3]) // Will print 'n' as it is the 3rd index characters (starts at 0)

CodePudding user response：

it just means that a string is an "array-like" object with each character available in a similar manner to an array element. Each of those characters are stored as a UTF-16 value.

// The following is one string literal:
let s = "ABCDEFG";

console.log(s);

// But it's also an array-like object in that it has a length and can be indexed
console.log("The length of the string is: ", s.length);
console.log("The 3rd character is: ", s[2]);

// And we can see that the characters are stored as separate UTF-16 values:
console.log(s.charCodeAt(2));

CodePudding user response：

As I understood:

unsigned means not or -.
16 bit means 2^16 number of elements/characters can represent.
set of Integers mean to represent a String use multiple integers (1 or more).

Therefore this means to represents a string js use set of numbers (each numbers is a 1 of 2^16 numbers because no float numbers and no positive/negative representation).

Note: to understand more read about utf-16

Reference:https://www.ibm.com/docs/en/i/7.2?topic=unicode-utf-16

CodePudding user response：

In Unicode, each symbol has an associated number. For example, "A" is 65, "a" is 97, etc. These numbers are called code points. Depending on the encoding we’re using (UTF-32, UTF-16, UTF-8, ASCII etc.), we represent/encode these code points in different ways. The things we use to encode these code point numbers are called "code units", or as MDN calls them, "elements".

As we're using JavaScript, we're interested in the UTF-16 encoding of characters. This means that to represent a single code unit/"element", we use 16 bits (2 bytes). For "A", the "element" representation is:

0000000001000001 // (16 bits, hence 0 padding)

There are a lot of characters that we need to represent (think emojis, Chinese, Japanese, Korean scripts etc. that each have their own code points), so 16 bits to represent and encode all of these characters alone isn't enough. That's why sometimes some code points are encoded using two code units/elements. For example,