Home > Enterprise >  Is char ch = '\xe4' unspecified or implementation defined
Is char ch = '\xe4' unspecified or implementation defined

Time:09-18

I am learning C using the books listed here. In particular, I read here that:

If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (char, char8_t, (since C 20)char16_t, char32_t, (since C 11)or wchar_t), the result is unspecified.

(emphasis mine)

This means that, in a system where char is signed, the result of '\xe4' will be unspecified. But here the person says that "it is implementation defined and not unspecified".

So, my question: Is the behavior of the below statements unspecified or implementation-defined? That is, is this an error in cppreferene's documentation or have I understood it incorrectly.

char arr[] = {'\xe4','\xbd','\xa0','\xe5','\xa5','\xbd','\0'}; //unspecified or implementation defined 
char ch = '\xef';                                              //unspecified or implementation defined

CodePudding user response:

This can be either implementation defined (as per C 17) or (probably) well defined (as per C 23).

In C 17 (or earlier?), according to this Draft Standard:

5.13.3 Character literals        [lex.ccon]


8     … The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character literals with no prefix) or wchar_t (for character literals prefixed by L). …

However, from this Draft C 23 Standard (also §5.3.13, [lex.ccon]):

3.2.3     Otherwise, if the character-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo 2N, where N is the width of T.

So, in your case, as long as the value of the escaped sequence is representable by an unsigned char, then there is neither undefined nor implementation-defined behaviour, as of C 23. However, if that value is outside the range of that unsigned equivalent, then the literal is ill-formed:

3.2.4     Otherwise, the character-literal is ill-formed.


Note: This C 20 Draft Standard has the same clause as the above-cited C 17 version (although it's paragraph 7, rather than 8).

CodePudding user response:

This concern was resolved by the adoption of P2029R4 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) for C 23 at the October 2021 WG21 virtual plenary. See that paper for links to the relevant core issues.

The behavior is now well-defined. C now requires two’s complement representation for integer types (following the adoption of P1236 for C 20). When char is a signed type, the result is the (unsigned) value of the numeric escape converted to a signed type. Whether the result is negative depends on the range of values representable by char (char may be larger than 8 bits).

  • Related