Home > other >  C Struct-Padding in Arrays vs. for Variables
C Struct-Padding in Arrays vs. for Variables

Time:01-30

While playing around with Structure-Padding I found something weird...

At first glance it seems off, that the Structure's size isn't the size of its Members and that Structures are padded differently based on whether it's inside of an Array or isn't:

Code

typedef struct {
    char c;
    double d;
    int i;
} test_struct;

int main() {
    printf("Size of Struct: %d\n", sizeof(test_struct));
    test_struct t1, t2;
    printf("Offset between Structs: %d\n", (long long) &t1 - (long long) &t2);
    test_struct arr[2];
    printf("Offset between Structs in Array: %d\n",  (long long) &arr[1] - (long long) &arr[0]);
}

Output

(64bit-system)

Size of Struct: 24
Offset between Structs: 32
Offset between Structs in Array: 24

CodePudding user response:

I describe the way a compiler typically lays out a structure here.

printf("Size of Struct: %d\n", sizeof(test_struct));

sizeof produces a result of type size_t. It should be printed with %zu, not %d. Once that is correct, this will print the number of bytes in the structure.

Note that you can use sizeof test_struct, because sizeof is not a function and does not require arguments to be passed in parentheses. It is an operator. If its operand is a type, that does need to be in parentheses, for reasons of C grammar.

test_struct t1, t2;
printf("Offset between Structs: %d\n", (long long) &t1 - (long long) &t2);

The compiler is free to place t1 and t2 where it wants in memory, subject to alignment rules and other considerations. They do not have to be adjacent to each other. long long values should be printed with %lld, not %d.

In C implementations with flat address spaces, conversion of a pointer to an integer will usually produce the expected address, and so subtracting in such an implementation will produce the offset between the addresses. However, this is not guaranteed by the C standard and is not true in all C implementations.

printf("Offset between Structs in Array: %d\n", sizeof(arr) / sizeof(*arr));

Dividing the size of an array by the size of an element produces the number of elements in the array, not the size of an element. And again, size_t values should be printed with %zu, not %d.

CodePudding user response:

The goal of Structure-Padding and Member-Alignment is to have all Members at "natural Address" in Memory.
Variable x is at a "natural Address" if &x % sizeof(x).

Processors read Memory in Words; 32bit-systems often read Memory in Words of 32bits/4bytes and similarly 64bit-systems often read Memory in Words of 64bits/8bytes.
To ensure that reading one Variable may be done reading a minimal amount of Words the compiler alignes them.

This boosts performance, as it cuts down on Word-accesses by the CPU. However it wastes some Memory as Padding.
Under extreme circumstances you might want to consider using the pack-pragma.

Quick Sidenote: the size of Pointers are 1 Word.


Size

sizeof(test_struct) returns 24 because Members within the Struct get aligned like this:

struct {
    char c; // 1 byte
    char pad1[7]; // so d is at byte 8 from the beginning (multiple of 8, d's size)
    double d; // 8 bytes
    int i; // 4 bytes
    char pad2[4]; // so consecutive Structs also have d at multiple of 8 globally
};

Offset

The above only generates a Structure with "naturally aligned" Members if the Struct itself is located at a multiple of 8. This applies generally: consecutive "correctly" padded Structs have all their Members at "natural Adresses" only if the first Structure is at a "natural Address" of it's biggest Member-Type.

C's biggest Primitive is long double (80bits/10bytes on 32bit & 128bit/16bytes on 64bit). From what we just learned we can conclude that placing Structures at Addresses which are multiples of long double's size guarantees that all Members of said Structure are correctly aligned. Hence C is putting Struct-Variables at addresses, which are multiples of 16. A Struct won't shrink in size, so the second Struct will be placed after the 24bytes of the first Struct 8bytes of offset, totaling to 32bytes of offset between the two.

If you're wondering whether this is wasting additional Memory, this Padding couldn't be simply added to the end of the Struct aswell or if #pragma pack(1) also prevents this:

No, and this is because C will actually squeeze in other, primitive, small enough Variables in front of Structs and hence there's no real downside anyway.


Offset in Array

When examining pad2 you might realize, that the Comment only holds true under the assumption, that the Struct itself is at a Memory-Address which is a multiple of 8.
Again speaking generally: at a multiple of it's biggest Member-Type's size.

Arrays of Structs always contain Objects of the same Type. So by adding a Padding at the end to make the total size a multiple of the biggest Member's size we can be certain all other Structs following will be aligned just like the first of the sequence is.

The first will be put at an Address which is a multiple of 16 as discussed previously. Hence adding this Padding makes for a great Space & Time efficiency of our Structs, because we aren't forced into multiples of 16 but instead can put Structs right next to each other.

You might realize, that using Arrays of Structs which don't contain long doubles rather than multiple Variables of that Struct, can actually be a bit more Memory-efficient if you don't happen to have many small Variable to fit into the gaps between your hypothetical Struct-Variables. But for most to all use-cases this will probably be irrelevant regardless, yet grasping why it's true shows understanding of Structure-Padding.


I hope I could help you with elaboration on the topic.
If you're still confused, take a look at more explainations here or other great material on the topic here.

  •  Tags:  
  • Related