Home > Back-end >  Copying differently sized data from a union to a byte array
Copying differently sized data from a union to a byte array

Time:08-20

I have a function that takes in an enum and a uint16_t value. The enum represents the type, and each type may either be of length 1 or 2 bytes. The function then prepares a packet to be sent over the network, and the packet size may either be 2 or 3 bytes, depending on the length of the data. The data needs to be converted to network byte order before sending.

In order to handle this, I have the following code. The real code is more defensive and I tried to simplify as much as I can for the purposes of this question.

bool set_config(param_type type, uint16_t value)
{
    size_t len = get_param_len(type); // returns either 1 or 2 
    if(len == 1 && value > 255){
        return false; 
    }
    
    // serializing the data to a packet
    union{
        uint8_t byte;
        uint16_t ushort;
        uint8_t bytes[2];
    }data;
    if(len == 2){
        data.ushort = value;
        data.ushort = htons(data.ushort)
    }else{
        data.byte = value;
    }
    
    uint8_t packet[3] = {0};
    packet[0] = type;
    
    // does this make sense? Or is this undefined behaviour?
    memcpy(&packet[1], &data.bytes[0], len);
    
    return send_packet(packet, 1   len); // sends the packet which may be 2 or 3 bytes
}

Is it a good idea to copy the data to the packet in this manner? Internally in the union, how is the data stored? Is the byte parameter in the data union always the same as bytes[0]? Or is this compiler specific?

CodePudding user response:

does this make sense? Or is this undefined behaviour?

It's fine as far as UB is concerned, C allows type punning with unions (as opposed to C ). If it makes sense, that's another story, the code looks needlessly complex to me.

It might be possible to use _Generic to do compile-time type adjustments instead. I don't understand why you use uint16_t as your generic type - that's much more cumbersome than a uint8_t [2].

Also I wouldn't add error checks such as "did I fill the parameters with random garbage?" inside the function, always put such code on the caller side. Because if the caller knows for sure that they aren't passing random garbage, then such checks are just pointless bloat.

Is it a good idea to copy the data to the packet in this manner?

Not really. Unions and structs are never 100% portable because of padding/alignment. Also they use the same endianess as the rest of the program, so which byte of the ushort that corresponds to byte depends on endianess.

Is the byte parameter in the data union always the same as bytes[0]?

Yes. But portably, you have no idea if it's the same as the MS byte of the ushort or same as the LS byte.

Or is this compiler specific?

It is CPU-specific and therefore also compiler-specific. All compilers for a CPU with a certain endianess and alignment should behave the same, however.

You can probably simplify the whole function to something like this:

bool set_config (param_type type, const uint8_t value[2])
{
  size_t len = get_param_len(type);

  uint8_t packet[3] = { type, value[0], value[1] };
    
  if(len==2)
  {
    const bool little_endian = (*(uint8_t*) &(int){1}) == 1;
    if(little_endian)
    {
      packet[1] = value [1];
      packet[2] = value [0];
    }
  }

  return send_packet(packet, 1   len); // sends the packet which may be 2 or 3 bytes
}

Or in case the code will never get ported to big endian machines:

bool set_config (param_type type, const uint8_t value[2])
{
  size_t len = get_param_len(type);
  uint8_t packet[3] = { type, value[1], value[0] };
  return send_packet(packet, 1   len); // sends the packet which may be 2 or 3 bytes
}

CodePudding user response:

In short, the answer is no, its not a safe.

A simple change made to a compiler setting or environmental differences between platforms could result in the data being interpreted incorrectly by the receiving application.

As you know the the memory size of a union is equivalent to its largest member. In simplistic terms the union defined as:

union{
        uint8_t byte;
        uint16_t ushort;
        uint8_t bytes[2];
     }data;

Will take up a minimum of 2 bytes, but the order of byte packing and how the bytes are packed is determined by compiler settings such as whether multi-byte values are ordered by least or most significant byte first and also microprocessor architecture.

For example:

The two bytes for byte and ushort could be packed as follows:

byte      | byte|     |
byte      |     | byte|
ushort    | MSB | LSB |
ushort    | LSB | MSB |

As you can see the value for byte may be stored in the first or the second byte, similarly the data stored for ushort may appear to be reversed with the Most Significant Byte appearing first in one example and the Least Significant Byte appearing first in the other.

Each of the above examples may be determined by the compiler and its settings.

To make matter worse some microprocessors will rearrange bytes dependent on their architecture for example when looking at uint32_t.

Instead of uint32_t being stored as |byte0|byte1|byte2|byte3| it may be stored as |byte1|byte0|byte3|byte2|.

If your union changed to

union{
    uint8_t  byte;
    uint16_t ushort;
    sytuct{
        uint8_t  byte[3];
        }multiByte;
    };

The matters become even more complex as now you will most probably have data alignment issues. On a 16 bit processor the 3 bytes present within the multiByte structure will be placed upon a 16 bit boundary causing the union to take up 4 bytes and not 3 bytes of data by default.

So if you are reliant on compiler settings or architecture to ensure that your data packing is consistent then the project isn't supportable long term and it may not be portable without change.

Hence for safety it is best to be pedantic and process the data to ensure that it is in the correct order both for transmission and on reception.

  • Related