Binary values, bit-shifts and endianness-CodePudding

I am following through a video-tutorial on serialization. In the video, the author has created an std::vector<int8_t> to which he is writing values of variables of integral types byte by byte.

So if we take an int32_t variable, it will take 4 elements of that vector. The way he is doing it is by shifting the binary value of this variable by a corresponding number of bits. I understand why this is done. If we take the value 76745, for example, the vector values will look like {0, 1, 43, -55}.

Here is the method with a little simplification from my side:

template <typename T>
void encode(std::vector<int8_t> *buffer, int16_t *iterator, T value)
{
    for (size_t i = 0; i < sizeof(T);   i)
    {
        (*buffer)[(*iterator)  ] = value >> ((sizeof(T) * 8 - 8) - i*8); 
    }
}

Now, my concern is, that he claims we need to do this in order to convert the number to little endian. I don't think it's the case, because from what I understand, endianness is how numbers are represented in memory. When we are operating on a value with, e.g., bit-shifts, we are operating on the binary value, so it has nothing to do with endianness, just like I read here.

He proceeds to give a piece of code to test, as he claims, whether the machine is little-endian:

bool IsLittleEndian() // weirdest
{
    int8_t a = 5;
    std::string res = std::bitset<8>(a).to_string();
    std::cout << res << '\n';
    return (res.back() == '1');
}

This again makes me doubt, because from what I read, bitset is not dependent on endianness. (Additionally, if the last bit of 5 is 1, then it should be big-endian, so I am sure a confusion occurred on his side.)

Can someone confirm whether my understanding is correct? I will gladly answer any clarifying questions.

His other content is of high quality, so I would really like to continue with the tutorial, but I need to have this figured out.

CodePudding user response：

Your understanding is correct: endianness has nothing to with arithmetic operations and bitset::to_string(). Both work in exactly the same way regardless of the underlying implementation details, and there is no "bit endianness" at all, see here. You may run your code on a ternary Soviet computer for all that, and the behavior of the program will be the same.

What endianness does affect in C is implementation-defined stuff like memory layout and byte representation. One can exploit that to use std::memcpy and reinterpret_cast to efficiently (de)serialize integers. Be wary of the strict aliasing rule, though, if you want somewhat more portability in theory.

One could say that the encode function translates an integer into an array of bytes in a little-endian way: lower bytes have lower addresses, and that this happens regardless of the endianness of the underlying system. That may be useful when you want the same serialization protocol on different systems.

However, it's still not what the encode function does: it stores the highest byte in the lower address, etc. So it's actually encoding the value in a big-endian way.

CodePudding user response：

The first part somewhat makes sense. The second part not so much.

If you were to simply std::copy the bytes of your value into buffer, i.e.

template <typename T>
void encode(std::vector<uint8_t>& buffer, size_t& position, T value)
{
    uint8_t* bytes = reinterpret_cast<uint8_t*>(&value);
    std::copy(bytes, bytes   sizeof(value), &buffer[position]);
    position  = sizeof(T);
}

You would end up with buffer containing the bytes of value in whatever order the platform uses. That is, encode(buffer, n, 0x01020304) would result in buffer containing {0x04, 0x03, 0x02, 0x01} on little-endian systems and {0x01, 0x02, 0x03, 0x04} on big-endian systems.

Using bit shifts to extract each byte of the value side-steps this issue since bit shift operations don't reflect platform endianness. n & 0xFF will always give you the least-significant byte of n, (n >> 8) & 0xFF the next most significant, and so on.

The second part makes no sense at all though. That IsLittleEndian function is nonsense. Endianness is byte order, not bit order. An int8_t, being only one byte, will never tell you anything about the platform's endianness.