Home > Enterprise >  C switch statement with case values of reinterpret_cast(string)
C switch statement with case values of reinterpret_cast(string)

Time:09-18

I receive a string at run-time and I want to convert it to an integer. Rather than just reinterpret-casting the string and using this integer (the integers will be too sparse) I am trying to use a switch statement (see below). However, unfortunately it doesn't seem possible as I'm getting compiler errors about unexpected syntax.

Is it possible to achieve this? The reinterpret_casted string is known at compile time?

// I receive a string at run-time and convert to integer
const uint64_t val = *reinterpet_cast<uint64_t*>(str.c_str());

switch(val)
{
    case (*reinterpret_cast<uint64_t*>("16 byte string1")):   // This not the string, just an example
        return 1;
    case (*reinterpret_cast<uint64_t*>("16 byte string2")):
        return 2;
    // Another 100  similar cases
}

CodePudding user response:

The case operand must be a constant expression. reinterpret_cast can never be evaluated in a constant expression. Therefore this is not going to work and you will have to fall back to an if/if else/else chain or implement something more clever (e.g. a jump table) yourself. switch is not going to be usable for this.

That is of course as long as you intent to use reinterpret_cast. If you e.g. write a constexpr function that converts the first few characters to an uint64_t via arithmetic and bitwise operations instead of a reinterpret_cast, then you can use that function in a case operand.

Using this instead of reinterpret_cast would also get rid of *reinterpret_cast<uint64_t*>(str.c_str()) which by itself is an aliasing violation and therefore has undefined behavior per standard. Some compilers, either straight-up or when given specific compilation flags, will however define the behavior of such aliasing violations to what you intent it to be. (However, core language undefined behavior per standard, even if defined by the specific compiler, is not allowed in a constant expression. Also, even if aliasing is not an issue, alignment may still be.)

CodePudding user response:

First of all, sizeof(uint64_t) is 8 on machines with 8-bit char. You're only going to get the first 8 bytes of strings, not 16. IDK if you're mixing this up with an 8-byte integer printing as 16 hex digits, where each 4 bits of a number map to an 8-bit ASCII character, or some other mistake, but a 16-character string is 128 bits on a normal system, so you'd need x86 __m128i (SSE2 integer vector of 16 bytes) or GCC unsigned __int128.


If your string values are sparse in their first 8 bytes, a compiler will probably just make a chain of conditional branches, not a hash table of jumps. (Or a hash table of data. Transforming control-flow to data-lookup is something compilers can sometimes do for switch, but AFAIK current compilers like GCC and clang only use plain arrays when the switch cases are mostly contiguous and in a small range. So you'd still have a chain of branches because of your sparse 64-bit integers.)

Anyway, the optimal implementation in asm is probably a hash table, so you should just write your source code to do that instead of a switch, as Mark Ransom commented. Use std::unordered_map.

Another comment also suggested gperf, the GNU Perfect Hash function generator. Given a set of strings (like the keywords of a language), it generates code that detects them and rejects other tokens, with no false positives or negatives. That might be worth considering.


std::unordered_map can use string keys. The constants to match against would have to get hashed at compile time, or once at run-time during construction, and the incoming string would have to get hashed.

If the strings are unique in their first 8 bytes (not 16), you might use std::unordered_map<uint64_t, int> if that makes key faster to hash than arbitrary-length strings. And it means the full strings wouldn't have to get stored anywhere, just the prefixes. Run-time init of a std::unordered_map doesn't require compile-time constants, so you don't need a constexpr-compatible way to take the first 8 bytes of strings.


But *reinterpret_cast<uint64_t*>("literal") is strict-aliasing undefined behaviour unless you're compiling with clang/gcc -fno-strict-aliasing, or with MSVC. Even if the aliasing behaviour is defined, it's still not usable in a constexpr.

C 20 std::bit_cast is the go-to for type-punning data (and is constexpr), but you might need your data in a struct of 8 bytes, or an 8-byte array, because it checks that the two types have the same size. So you might need to do struct eightbyte {char str[8];} and manually truncate your strings for initializing it if you wanted to do something that was fully constexpr.

memcpy works well and gets fully optimized; not in a constexpr compatible way, but with optimization enabled does in fact become a constant. Not one you could use in a switch or as the size of an array (because it's not constexpr / consteval), but fine for optimization purposes.

#include <string.h>
#include <stdint.h>

// not usable in a  switch / case because this isn't constexpr compatible, 
// but usable for run-time init of std::unordered_map<uint64_t, int>
uint64_t str_as_u64(const char *p)
{
   uint64_t tmp;
   memcpy(&tmp, p, sizeof(tmp));
   return tmp;
}

uint64_t test(){
    return str_as_u64("hello world");
}

GCC (Godbolt) optimized the test function to returning an immediate constant, no actual loading / storing and no string data hanging around in memory anywhere:

# GCC12.2 -O3 for x86-64
test():
        movabsq $8031924123371070824, %rax   # 0x6F77206F6C6C6568
        ret

Further constant-propagation through hash functions might also happen; I didn't look at constructing a std::unordered_map.

  • Related