Home > other >  Using UTF-8 string-literal prefixes portably between C 17 and C 20
Using UTF-8 string-literal prefixes portably between C 17 and C 20

Time:02-01

I have a codebase written in C 17 that makes heavy use of UTF-8, and the u8 string literal introduced in c 11 to indicate UTF encoding. However, c 20 changes the meaning of what the u8 literal does in C from producing a char or const char* to a char8_t or const char8_t*; the latter of which is not implicitly pointer convertible to const char*.

I'd like for this project to support operating in both C 17 and C 20 mode without breakages; what can be done to support this?


Currently, the project uses a char8 alias that uses the type-result of a u8 literal:

// Produces 'char8_t' in C  20, 'char' in anything earlier
using char8 = decltype(u8' ');

But there are a few problems with this approach:

  1. char is not guaranteed to be unsigned, which makes producing codepoints from numeric values not portable (e.g. char8{129} breaks with char, but not with char8_t).

  2. char8 is not distinct from char in C 17, which can break existing code, and may cause errors.

  3. Continuing from point-2, it's not possible to overload char with char8 in C 17 to handle different encodings because they are not unique types.

What can be done to support operating in both C 17 and C 20 mode, while avoiding the type-difference problem?

CodePudding user response:

I would suggest simply declaring your own char8_t and u8string types in pre-C 20 versions to alias unsigned char and basic_string<unsigned char>. And then anywhere you run into conversion problems, you can write wrapper functions to handle them appropriately in each version.

  •  Tags:  
  • Related