Why is this data being flipped-CodePudding

Below is an example of processing very similar to what I am working with. I understand the concept of endianness and have read through the suggested posts but it doesn't seem to explain what is happening here.

I have an array of unsigned characters that I am packing with data. I was under the impression that memcpy was endianness agnostic. I would think that the left-most bit would stay the left must bit. However when I attempt to print the characters each word is copied backwards. Why does this happen?

#include <iostream>
#include <cstring>
#include <array>

const unsigned int MAX_VALUE = 64ul;
typedef unsigned char DDS_Octet[MAX_VALUE];
int main()
{
    // create an array and populate it with printable 
    // characters
    DDS_Octet octet;
    for(int i = 0; i < MAX_VALUE;   i)
        octet[i] = (i   33);

    // This is an equivalent copy operation
    // to what is actually being used
    std::array<unsigned int, 16> arr;
    memcpy(
        arr.data(),
        octet,
        sizeof(octet));

    // Print the character contents of each
    // word left to right (MSB to LSB on little endian)
    for(auto i : arr)
        std::cout 
        << (char)(i >> 24) << "\t" 
        << (char)((i >> 16) & 0xFF) << "\t" 
        << (char)((i >> 8) & 0xFF) << "\t"
        << (char)(i & 0xFF) << "\n";

** output **

$   #   "   !
(   '   &   %
,       *   )
0   /   .   -
4   3   2   1
8   7   6   5
<   ;   :   9
@   ?   >   =
D   C   B   A
H   G   F   E
L   K   J   I
P   O   N   M
T   S   R   Q
X   W   V   U
\   [   Z   Y
`   _   ^   ]

CodePudding user response：

When you memcpy 4 chars into a 4-byte unsigned int they get stored in the same order they were in the original array. That is, the first char in the input array will be stored in the lowest address byte of the unsigned int, the second in the second lowest address byte, and so on.

x86 is little-endian. The lowest address byte of an unsigned int is the least significant byte.

The shift operators are endianess-independent though. They work on the logical representation of an integer, not the physical bytes. That means, for an unsigned int i on a little-endain platform, i & 0xFF gives the lowest address byte and (i >> 24) & 0xFF gives the highest address byte, while on a big-endian platform i & 0xFF gives the highest address byte and (i >> 24) & 0xFF gives the lowest address byte.

Taken together, these threee facts explain why your data is reversed. '!' is the first char in your array, so when you memcpy that array into an array of unsigned int '!' becomes the lowest address byte of the first unsigned int in the destination array. The lowest address byte is the least significant on your little-endian platform, and so that is the byte you retrieve with i & 0xFF.

CodePudding user response：

The value 0x12345678 is stored as 4 bytes: 0x78 0x56 0x34 0x12. But 0x12345678>>24 is still 0x12 because that has nothing to do with the 4 separate bytes.

If you have the 4 bytes: 0x78 0x56 0x34 0x12, and interpret them as a 4-byte little-endian integer, you get 0x12345678. If you right-shift by 24 bits, you get the 4th byte: 0x12. If you right-shift by 16 bits and mask with 0xff, you get the 3rd byte: 0x34. And so on. Because ((0x12345678 >> 16) & 0xff) == 0x34

The memcpy has nothing to do with it.

CodePudding user response：

Maybe this will let you understand easier. Let's say we have these data defined:

uint32_t val = 0x01020304;
auto *pi = reinterpret_cast<unsigned char *>( &val );

Following code will produce the same result on big-endian and little-endian platform:

std::cout << ( (val >> 24) & 0xFF ) << '\t'
          << ( (val >> 16) & 0xFF ) << '\t'
          << ( (val >> 8) & 0xFF ) << '\t'
          << ( (val >> 0) & 0xFF ) << '\n';

but this code will have different output:

std::cout << static_cast<unsigned int>( pi[0] ) << '\t'
          << static_cast<unsigned int>( pi[1] ) << '\t'
          << static_cast<unsigned int>( pi[2] ) << '\t'
          << static_cast<unsigned int>( pi[3] ) << '\n';

it has nothing to do with memcpy(), it is how ints are stored in memory and how bit shifting operation works.

CodePudding user response：

Your conclusion is off. memcpy indeed just copies bytes, no matter what endianess is used. Copying bytes into an int though is not independent of endianess, because depending on endianess the bytes will be interpreted differently.

See here how you can get the same output without any copying. I set the bytes from first to last, then print them from first to last. Your printing reverses the order.

#include <iostream>
#include <cstring>
#include <array>
#include <bit>

int main()
{
    unsigned int a = 0;
    unsigned char* f = reinterpret_cast<unsigned char*>(&a);
    for (int i=0; i<sizeof(unsigned int);  i,  f){
        *f = i 33;
    }
    
    f = reinterpret_cast<unsigned char*>(&a);
    for (int i=0; i<sizeof(unsigned int);  i,  f){
        std::cout << (char)*f;
    }
    std::cout << "\n";
    
    std::cout << (char)(a >> 24) 
        << (char)((a >> 16) & 0xFF) 
        << (char)((a >> 8) & 0xFF)
        << (char)(a & 0xFF) << "\n";
}

Output:

!"#$
$#"!