Home > Software engineering >  How to Determine C Byte Offsets
How to Determine C Byte Offsets

Time:09-28

Hi I have a question from a book that I am confused about in regards to byte offsets, I am learning by myself and am confused. The question is asking:

For the structure declaration

struct
{
char *a; //1 byte
short b; //2 bytes
double c; //8 bytes
char d; //1 byte
float e; //4 bytes
char f; /1 byte
long long g; //8 bytes
void *h; //8 bytes 
} foo;

suppose it was compiled on a Windows and UNIX machine, where each primitive data type of K bytes must have an offset that is a multiple of K.

A. What are the byte offsets of all the fields in the structure?

B. What is the total size of the structure?

C. Rearrange the fields of the structure to minimize wasted space, and then show the byte offsets and total size for the rearranged structure.

I don't really understand byte offsets, I know we are supposed to get the size of each primitive type and then block them out side by side, since the highest byte is double right so they would all be 8? Help would be appreciated thanks. Also if I just count the size of the primitive types for B. that would not be correct would it?

CodePudding user response:

This is an exercise in understanding alignment and struct padding.

On most high-end systems like PC, the CPU has support for efficiently handling things like 32 or 64 bit chunks of data in a single instruction. Most often this require the data to be stored on an aligned address for the given type. That is, an address which is evenly divisible by the size of the type.

For example a uint32_t may have an alignment requirement of 4 bytes and instructions reading it will require the data to be stored on addresses ending with 0, 4, 8 and so on. Failing to allocate the data on such an address would lead to a misaligned access, which depending on system could lead to slower code or even a hardware exception. C doesn't guarantee what will happen if you do misaligned access, it is so-called undefined behavior.

Now when it comes to structs in C, there is a requirement that the members must be allocated in the order specified by the programmer, starting with the first member at the lowest address. The compiler is not allowed to re-order them or allocate them as it pleases, as it would be allowed to do with plain local variables.

But in order to allocate members in the order specified and still sate the alignment requirements of each member's type for the given system, a compiler is allowed to insert so-called padding bytes anywhere inside the struct, except at the very top of it, which is guaranteed to be aligned. Basically filling up extra dummy bytes between members to ensure that each member is aligned.

I hacked together a little program to display the contents of your struct (source here). How it works and what it does isn't important, what's important is the output gained from executing it, in this case with gcc on x86_64 Linux:

     char* a  size: 8   theo offs:  0   real offs:  0
     short b  size: 2   theo offs:  8   real offs:  8
    double c  size: 8   theo offs: 10   real offs: 16
      char d  size: 1   theo offs: 18   real offs: 24
     float e  size: 4   theo offs: 19   real offs: 28
      char f  size: 1   theo offs: 23   real offs: 32
 long long g  size: 8   theo offs: 24   real offs: 40
     void* h  size: 8   theo offs: 32   real offs: 48
Theoretical size of foo: 40
Real size of foo: 56

Here you see which size each member got on the given system.

"Theo offs" means theoretical offset - the offset that this member would have in the struct on a system with no alignment requirements and no inserted padding.

"Real offs" means the real offset that the compiler actually gave this member in memory, based on alignment requirements.

Similarly, you can see the theoretical size if the struct had no padding just 100% data, as well as the real size it actually ended up with.

For example the member c has an 8 byte alignment requirement so it couldn't be placed at offset 10, because that isn't an 8 byte aligned address. Instead the compiler moved it up to address 16, inserting no less than 6 padding bytes to achieve this.

These padding bytes are wasted space. With more careful planning, we could have placed actual data in those 6 bytes instead. For example other members like d, which is just 1 byte with no alignment requirement. So simply by changing the order of c and d, we would see a pretty drastic change of the total struct size (from 56 to 48 in my example).

Re-ordering these members in the most efficient way is left as an exercise, though a hint is to always place the largest members first, if possible.

CodePudding user response:

What are the byte offsets of all the fields in the structure?

You can get what are the byte offsets with offsetof macro. The offsets are implementation defined and expect them to be different depending on platform.

What is the total size of the structure?

You can get the size with sizeof. You should generally expect the size and offsets to be only known after compilation and only by the compiler. The compiler can put as many padding bytes between members as it wants. You only know that the compiler can't rearrange the members, and that the first members starts immediately.

How to Determine C Byte Offsets

Generally, compile a sample program that would print all the interesting byte offsets and the sizeof of the struct. Then execute the program, and get the information from its output.

here each primitive data type of K bytes must have an offset that is a multiple of K.

Rearrange the fields of the structure to minimize wasted space, and then show the byte offsets and total size for the rearranged structure.

C programming language is nowadays more of a abstraction of the underlying hardware. You would write a serialization and deserialization routines from "minimal waste space" to the abstraction that you can use in C:

    // the abstract C representation of the data:
    struct foo_s { 
        char a;
        short b;
        double c;
        char d;
        float e;
        char f;
        long long g;
        void *h;
    };
    
    // Note - this code is technically invalid.
    #define GET_MEMBER_SIZE(type, memb)  (sizeof(((type*)0)->(memb)))
    
    // The size of the "minimal waste space" buffer.
    const size_t FOO_SIZE =
       GET_MEMBER_SIZE(struct foo_s, a)  
       GET_MEMBER_SIZE(struct foo_s, b)  
       GET_MEMBER_SIZE(struct foo_s, c)  
       GET_MEMBER_SIZE(struct foo_s, d)  
       GET_MEMBER_SIZE(struct foo_s, e)  
       GET_MEMBER_SIZE(struct foo_s, f)  
       GET_MEMBER_SIZE(struct foo_s, g)  
       GET_MEMBER_SIZE(struct foo_s, h);
    
   // Small wrapper to do memcpy with sizeof and increment the buffer.
   #define BUFFER_WRITE(pnt, var)  do { \
         memcpy(pnt, &var, sizeof(var)); \
         pnt  = sizeof(var); \
      } while(0)

    // Convert C abstraction to "minimal waste space" buffer.
    void foo_serialize(char buf[FOO_SIZE], const struct foo_s *t) {
       BUFFER_WRITE(buf, t->a);
       BUFFER_WRITE(buf, t->b);
       BUFFER_WRITE(buf, t->c);
       BUFFER_WRITE(buf, t->d);
       BUFFER_WRITE(buf, t->e);
       BUFFER_WRITE(buf, t->f);
       BUFFER_WRITE(buf, t->g);
       BUFFER_WRITE(buf, t->h);
    }

   #define BUFFER_READ(pnt, dest)  do { \
         memcpy(&dest, pnt, sizeof(dest)); \
         pnt  = sizeof(dest); \
      } while(0)

    // Convert from "minimal waste space" to C abstraction.
    void foo_deserialize(struct foo_s *t, const char buf[FOO_SIZE]) {
       BUFFER_READ(buf, t->a);
       BUFFER_READ(buf, t->b);
       BUFFER_READ(buf, t->c);
       BUFFER_READ(buf, t->d);
       BUFFER_READ(buf, t->e);
       BUFFER_READ(buf, t->f);
       BUFFER_READ(buf, t->g);
       BUFFER_READ(buf, t->h);
    }

Alternatively, if you really want to write with buffers, you would write setters/getters for every member to provide C abstraction:

   char foo_get_a(const char buf[FOO_SIZE]) {
        char r;
        memcpy(&r, buf, sizeof(r));
        return r;
   }
   void foo_set_a(char buf[FOO_SIZE], char r) {
        memcpy(buf, &r, sizeof(r));
   }
   short foo_get_b(const char buf[FOO_SIZE]) {
        short r;
        memcpy(&r, buf   sizeof(char), sizeof(r));
        return r;
   }
   void foo_set_b(char buf[FOO_SIZE], short r) {
        memcpy(buf   sizeof(char), &r, sizeof(r));
   }
   // etc. for each member

CodePudding user response:

Rearrange the fields of the structure to minimize wasted space

To address this one part and expand on always place the largest members first, if possible maxim:

  • If pressed to code for likely minimize wasted space, consider ordering per specified minimum size - or at least the likely minimum size.

  • There is a standard library type call max_align_t "which is an object type whose alignment is the greatest fundamental alignment". struct/union order of objects a multiple of this size or more likely makes no difference to minimize wasted space.

  • Function pointers may differ in size that object pointers. If so, function pointers are usually larger.

  • Integers and pointers have independent sizes. Avoid assuming too much about comparative sizes.

  • Arrays of objects must meet the alignment needs of the first element.

  • Common integer types ascend (non-decrease) in size char, short, int, long, long long, intmax_t. Their relationship to intN_t is subject to limitations, but not certain.

  • Floating point (FP) types ascend in size float, double, long double.

  • Complex FP are certainly 2x their non-complex size.

  • bool may be the size of a char or int or possibly (and rarely) others.

  • size_t, ptrdiff_t, (u)intptr_t relate to pointer sizes yet have potential to be wider/narrower then commonly seen.

  • time_t, struct tm, thrd_t, ... and other specialize types (aside from characters) tend to meet max_align_t requirement, but are not specified as such.

  • Bit-fields alignment/packing needs have many considerations.

  •  Tags:  
  • c
  • Related