I am learning C using the books listed here. In particular, I read here that:
If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (
char
, char8_t, (since C 20)char16_t, char32_t, (since C 11)or wchar_t), the result is unspecified.
(emphasis mine)
This means that, in a system where char
is signed, the result of '\xe4'
will be unspecified. But here the person says that "it is implementation defined and not unspecified".
So, my question: Is the behavior of the below statements unspecified or implementation-defined? That is, is this an error in cppreferene's documentation or have I understood it incorrectly.
char arr[] = {'\xe4','\xbd','\xa0','\xe5','\xa5','\xbd','\0'}; //unspecified or implementation defined
char ch = '\xef'; //unspecified or implementation defined
CodePudding user response:
This can be either implementation defined (as per C 17) or (probably) well defined (as per C 23).
In C 17 (or earlier?), according to this Draft Standard:
5.13.3 Character literals [lex.ccon]
…
8 … The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined forchar
(for character literals with no prefix) orwchar_t
(for character literals prefixed byL
). …
However, from this Draft C 23 Standard (also §5.3.13, [lex.ccon]):
3.2.3 Otherwise, if the character-literal's encoding-prefix is absent or
L
, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's typeT
that is congruent to v modulo 2N, where N is the width ofT
.
So, in your case, as long as the value of the escaped sequence is representable by an unsigned char
, then there is neither undefined nor implementation-defined behaviour, as of C 23. However, if that value is outside the range of that unsigned
equivalent, then the literal is ill-formed:
3.2.4 Otherwise, the character-literal is ill-formed.
Note: This C 20 Draft Standard has the same clause as the above-cited C 17 version (although it's paragraph 7, rather than 8).
CodePudding user response:
This concern was resolved by the adoption of P2029R4 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) for C 23 at the October 2021 WG21 virtual plenary. See that paper for links to the relevant core issues.
The behavior is now well-defined. C now requires two’s complement representation for integer types (following the adoption of P1236 for C 20). When char
is a signed type, the result is the (unsigned) value of the numeric escape converted to a signed type. Whether the result is negative depends on the range of values representable by char
(char
may be larger than 8 bits).