The title is a bit wordy, the code demonstrates the problem better:
// Equivalent to क़
constexpr auto arr = std::array<char, 3>{static_cast<char>(0340),
static_cast<char>(0245),
static_cast<char>(0230)};
int main()
{
constexpr auto a = std::string_view{"क़"};
constexpr auto b = std::string_view{arr.data(), arr.size()};
static_assert(a.size() == 3);
static_assert(b.size() == 3);
static_assert(a[0] == b[0]);
static_assert(a[1] == b[1]);
static_assert(a[2] == b[2]);
static_assert(a == b);
return EXIT_SUCCESS;
}
The last static_assert
fails on MSVC, but is fine on gcc and clang. At first I thought it might have been a Windows thing not supporting UTF-8 well, but it works fine at runtime:
int main()
{
constexpr auto a = std::string_view{"क़"};
constexpr auto b = std::string_view{arr.data(), arr.size()};
return a == b ? EXIT_SUCCESS : EXIT_FAILURE;
}
Adding /utf-8
to the compiler args makes no difference. It does appear to be a Unicode/UTF-8 issue, because a plain ASCII string works:
// foo
constexpr auto arr = std::array<char, 3>{'f', 'o', 'o'};
int main()
{
constexpr auto a = std::string_view{"foo"};
constexpr auto b = std::string_view{arr.data(), arr.size()};
static_assert(a == b);
return EXIT_SUCCESS;
}
This feels like a compiler bug, but I'm no language lawyer so it could be that I'm doing something I'm not supposed to - can anybody see what?
CodePudding user response:
This is a compiler bug which Microsoft devs seem to already be aware of, see this bug report against the standard library.
It seems that comparing narrow string literals with bytes outside the [0,127] range against non string literals currently fails at compile-time, because the built-in __builtin_memcmp
has a bug.
The issue is already a year old, but I couldn't find an update on it.