Home > Net >  UTF-8 constexpr std::string_view from static constexpr std::array not valid on MSVC
UTF-8 constexpr std::string_view from static constexpr std::array not valid on MSVC

Time:09-19

The title is a bit wordy, the code demonstrates the problem better:

// Equivalent to क़
constexpr auto arr = std::array<char, 3>{static_cast<char>(0340),
                                         static_cast<char>(0245),
                                         static_cast<char>(0230)};

int main()
{
    constexpr auto a = std::string_view{"क़"};
    constexpr auto b = std::string_view{arr.data(), arr.size()};

    static_assert(a.size() == 3);
    static_assert(b.size() == 3);
    static_assert(a[0] == b[0]);
    static_assert(a[1] == b[1]);
    static_assert(a[2] == b[2]);

    static_assert(a == b);

    return EXIT_SUCCESS;
}

The last static_assert fails on MSVC, but is fine on gcc and clang. At first I thought it might have been a Windows thing not supporting UTF-8 well, but it works fine at runtime:

int main()
{
    constexpr auto a = std::string_view{"क़"};
    constexpr auto b = std::string_view{arr.data(), arr.size()};

    return a == b ? EXIT_SUCCESS : EXIT_FAILURE;
}

Adding /utf-8 to the compiler args makes no difference. It does appear to be a Unicode/UTF-8 issue, because a plain ASCII string works:

// foo
constexpr auto arr = std::array<char, 3>{'f', 'o', 'o'};

int main()
{
    constexpr auto a = std::string_view{"foo"};
    constexpr auto b = std::string_view{arr.data(), arr.size()};

    static_assert(a == b);

    return EXIT_SUCCESS;
}

This feels like a compiler bug, but I'm no language lawyer so it could be that I'm doing something I'm not supposed to - can anybody see what?

CodePudding user response:

This is a compiler bug which Microsoft devs seem to already be aware of, see this bug report against the standard library.

It seems that comparing narrow string literals with bytes outside the [0,127] range against non string literals currently fails at compile-time, because the built-in __builtin_memcmp has a bug.

The issue is already a year old, but I couldn't find an update on it.

  • Related