Home > other >  Hashing raw bytes in C ?
Hashing raw bytes in C ?

Time:02-12

I want to write a function that takes two types T, U such that sizeof(T) sizeof(U)<=8 and gets a uint64_t by just reinterpreting their bytes one after the other. However this does not seem to work. I am certain there is a quicker and more elegant (and correct) way to do it but I have no clue. Any tips are greatly appreciated.

#include <cstdint>
#include <iostream>
#include <vector>

template <typename T, typename U>
constexpr auto hash8(T x, U y) {
  static_assert(sizeof(T)   sizeof(U) <= 8);

  uint64_t u = 0;
  uint64_t v = 0;
  auto px = (uint8_t*)&x;
  auto py = (uint8_t*)&y;
  for (auto i = 0; i < sizeof(T);   i) {
    u |= (uint64_t)px[i];
    u <<= 8;
  }
  for (auto i = 0; i < sizeof(U);   i) {
    v |= (uint64_t)py[i];
    v <<= 8;
  }

  return u << (sizeof(U) * 8) | v;
}

int main() {
  std::cout << hash8(131, 0) << '\n';
  std::cout << hash8(132, 0) << '\n';
  std::cout << hash8(500, 0) << '\n';
}

CodePudding user response:

I cannot help with the problem in your code due to lack of details, but I can propose a perhaps simpler solution.

Firstly, I recommend adding a check that the argument objects have unique object representation. Unless that is satisfied, the hash would be meaningless.

Secondly, std::memcpy might make this simpler:

template <typename T, typename U>
auto
hash8(T x, U y) noexcept {
    static_assert(sizeof x   sizeof y <= sizeof(std::uint64_t));
    static_assert(std::has_unique_object_representations_v<T>);
    static_assert(std::has_unique_object_representations_v<U>);
    std::uint64_t ret{};
    auto ptr = reinterpret_cast<unsigned char*>(&ret);
    std::memcpy(ptr, std::addressof(x), sizeof x);
    ptr  = sizeof x;
    std::memcpy(ptr, std::addressof(y), sizeof y);
    return ret;
}

Next, we can generalise this to arbitrary number of arguments (so long as they fit), and different return types:

template <typename R = std::uint64_t, typename... Args>
auto
hash(Args... args) noexcept {
    static_assert((sizeof args   ...) <= sizeof(R));
    static_assert((std::has_unique_object_representations_v<Args> && ...));
    static_assert(std::has_unique_object_representations_v<R>);
    R ret{};
    auto ptr = reinterpret_cast<unsigned char*>(&ret);
    (
        (
            std::memcpy(ptr, std::addressof(args), sizeof args),
            ptr  = sizeof args
        ), ...
    );
    return ret;
}

There is a caveat that a hash such as this is not the same across different systems, even if the sizes of the objects match.

P.S. It's pointless to make your function constexpr because you use reinterpret casting which isn't allowed in constant expressions.

CodePudding user response:

The easiest way is usually to do a memcpy:

#include <cstdint>
#include <cstring> // for memcpy

template <typename T, typename U>
auto hash8(T x, U y) {
  static_assert(sizeof(T)   sizeof(U) <= 8);

  uint64_t u = 0;
  char* u_ptr = reinterpret_cast<char*>(&u);
  std::memcpy(u_ptr, &x, sizeof x);
  std::memcpy(u_ptr sizeof x, &y, sizeof y);
  return u;
}

Any decent compiler will inline the memcpy call to a few bit operations, if the size parameter is known at compile time (and reasonably small).

If you actually need a constexpr function you can try using std::bit_cast from C 20 (maybe difficult if either input parameter does not have a size of 1, 2, 4, or 8).

  • Related