Home > Software engineering >  byte-wise operation on multibyte native types idomatically
byte-wise operation on multibyte native types idomatically

Time:11-24

In C I would, without hesitation, write the following:

uint32_t value = 0xDEADBEEF;
uint8_t *pValue = &value;

for (size_t i = 0; i < sizeof(value); i  )
{
    pValue[i] ^= 0xAA;
}

But in C 17 I'm faced with two constraints from my code scanner

  • Use "std::byte" for byte-oriented memory access.
  • Replace "reinterpret_cast" with a safer cast.

The latter coming into effect when my C lizard brain inevitably tries to force a typecast. So how exactly do I idiomatically accomplish what I'm trying to do? In my googling I ran past memcpy in and out of the data of a vector but that feels goofy.

CodePudding user response:

reinterpret_cast visually declares that you are converting a pointer type to an (aliasing-safe) unrelated pointer type, and enforces that there is no runtime overhead. (By comparison, C casts can be rather difficult to spot in code, and some will have runtime costs)

std::byte replaces char and its aliases to designate that you are not working with a printable character or an "integer", in the mathematical sense. It is an arbitrary collection-of-bits byte.

uint32_t value = 0xDEADBEEF;
std::byte *pValue = reinterpret_cast<std::byte*>(&value);

for (std::size_t i = 0; i < sizeof(value); i  )
{
    pValue[i] ^= std::byte{0xAA};
}

CodePudding user response:

As noted at https://en.cppreference.com/w/cpp/language/

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true: ... AliasedType is std::byte (since C 17), char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

So, reinterpret_cast is an eldritch abomination but won't give you any error no matter what type pointer you give it, except that these 3 ways of naming bytes is allowed.

So a "safer" cast would be a function that specifically only gives you an array-of-bytes view of an arbitrary object, rather than exposing the reinterpret_cast and making the reviewer check that the type used is allowed and it's used in the idiomatic way.

Meanwhile, you need to write an old-fashioned counting for loop and index the pointer to get to each element. You can have the "safer" (more specific, more targeted) feature take care of that too, by providing a result that is a proper range of bytes, not just a pointer to the beginning.

The easiest way is to use std::span.
Just off the top of my head:

template <typename T>
auto raw_byte_view (T& original)
{
     constexpr auto sz = sizeof(T);
     auto& arr = reinterpret_cast<std::byte(&)[sz]>(&original);
     return std::span<std::byte,sz>(arr);
}

I've not tried that, but it appears that to get the compile-time size version of the span, you have to give in an array reference, not separate pointer,length arguments. You can cast to reference to an array, and it includes the size as part of the type.

discussion

Returning span as a wrapper around the array reference is nicer than trying to deal with the array reference itself, because it's too easy to mishandle the array reference as it has to be kept as a reference. You could accomplish the same thing with a pointer to a std::array which unlike the C array works like any normal type. But we want reference semantics — the returned thing is a reference into an existing object, not a thing to be copied. So it's better to use a type that represents this indirection rather than stacking another pointer on top of a value-based type.

end discussion

 

Now you can write:

for (auto& x : raw_byte_view(value)) {
   x ^= std::byte{0xAA};
}
  • Related