I have a codebase written in C 17 that makes heavy use of UTF-8, and the u8
string literal introduced in c 11 to indicate UTF encoding. However, c 20 changes the meaning of what the u8
literal does in C from producing a char
or const char*
to a char8_t
or const char8_t*
; the latter of which is not implicitly pointer convertible to const char*
.
I'd like for this project to support operating in both C 17 and C 20 mode without breakages; what can be done to support this?
Currently, the project uses a char8
alias that uses the type-result of a u8
literal:
// Produces 'char8_t' in C 20, 'char' in anything earlier
using char8 = decltype(u8' ');
But there are a few problems with this approach:
char
is not guaranteed to be unsigned, which makes producing codepoints from numeric values not portable (e.g.char8{129}
breaks withchar
, but not withchar8_t
).char8
is not distinct fromchar
in C 17, which can break existing code, and may cause errors.Continuing from point-2, it's not possible to overload
char
withchar8
in C 17 to handle different encodings because they are not unique types.
What can be done to support operating in both C 17 and C 20 mode, while avoiding the type-difference problem?
CodePudding user response:
I would suggest simply declaring your own char8_t
and u8string
types in pre-C 20 versions to alias unsigned char
and basic_string<unsigned char>
. And then anywhere you run into conversion problems, you can write wrapper functions to handle them appropriately in each version.