Home > OS >  Padding at the end of struct with variable size array seems wrong
Padding at the end of struct with variable size array seems wrong

Time:03-12

Consider these structs on common 64bit system:

struct V1 {         // size 1, alignment 1
    uint8_t size;   // offset 0, size 1, alignment 1
    uint8_t data[]; // offset 1, size 0, alignment 1
};

struct V2 {        // size 12, alignment 4
    char c;       // offset  0, size 1, alignment 1
    int length;   // offset  4, size 4, alignment 4
    char b;       // offset  8, size 1, alignment 1
    short blob[]; // offset 10, size 0, alignment 2
};

In the first case the data member is right at the end of the struct taking up no space. This causes the following odd-ness:

struct V1 blobs[2];
&blobs[0].data == &blobs[1].size

Luckily the C standard §6.7.2.1, paragraph 3 says:

A structure or union shall not contain a member with incomplete or function type,… except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.

So the above array is illegal and there is no problem with the addresses being the same.

What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?

Secondly I have a problem with V2. The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.

In fact I would go so far as to say it is wrong to add padding there. It obfuscates calculating the size of the struct for a given length of blob because now the offset of blob has to be considered instead of the size of the struct.

align = _Alignof(struct V2);
needed_size = offsetof(struct V2, blob)   length;   // beware of overflow
needed_size = (needed_size   align - 1) & (~align); // beware of overflow

Is there something I'm missing why struct V2 must be padded?

CodePudding user response:

This answer addresses “Is there something I'm missing why struct V2 must be padded?”

If a compiler did not pad a structure type to be a multiple of its alignment requirement, then some structure types would violate this rule in C 2018 6.7.2.1 18:

… In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply…

To see this, consider this structure in an implementation where int is four bytes and has a four-byte alignment requirement:

struct s0
{
    int  i;
    char c;
};

This structure requires five bytes for its members, so it must be padded to eight bytes to satisfy the alignment requirements when used in an array. Next, we add flexible array member:

struct s1
{
    int  i;
    char c;
    char a[];
};

This structure also requires five bytes for its inflexible members. None are required for the flexible array. If the compiler did not pad it to eight bytes, it would be shorter than struct s0, which violates the rule that its size must be either as if the flexible array member were omitted or that size plus more padding.

This tells us why a conforming compiler is constrained to include the padding. However, it does not tell us the reason for the rule. I see none except that it would be more complicated to write rules into the C standard to allow less padding.

Some Discussion About Object Size

I see nothing in the C 2018 standard which explicitly says the size of an object must be a multiple of its alignment requirement. Obviously, the ability to put objects into an array depends on this, but the lack of a requirement that the size be a multiple of an alignment requirement would mean there might be some objects (besides a structure with flexible array member) that could not be used in arrays; the inability to put objects into an array would not cause the requirement to come into existence.

Thus, it might be conforming for a C implementation to define struct s0 to be five bytes with an alignment requirement of four bytes, and then it could make struct s1 also five bytes with an alignment requirement of four bytes.

CodePudding user response:

What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?

As @EricPostpischil explained in comments, the constraint in question is not about the layout of objects in memory, but rather about the declared element type of an actual array. An object that is not declared as an array is not an array in the relevant sense, no matter how array-like it may seem, or how we think about it or use it. So no, the language spec does not forbid what you describe.

The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.

The C language specification permits implementations to pad structure layouts after any member, including the last, at their own discretion. Among the primary purposes is to allow structure members to be properly aligned, including, but not limited to, within arrays of structures, but use of padding in structure layouts is not contingent on there being an alignment-based justification.

In fact I would go so far as to say it is wrong to add padding there.

"Wrong" a strong word. Especially in the context of a language-lawyer question, you should back it up with an argument based on the language specification. I don't think you can do that.

It obfuscates calculating the size of the struct for a given length of blob because now the offset of blob has to be considered instead of the size of the struct.

Not exactly true. If you want to compute the minimum possible size into which an instance of your structure can fit then yes, you need to take the offset of the FAM into account. However,

  1. That's not a function of there being padding, but rather of the offset of the FAM differing from the size of the structure. That can't happen without padding, but it doesn't have to happen with padding.

  2. If you are so space-constrained that you cannot accommodate the possibility of a few bytes of overallocation for the sake of clearer code, then dynamic allocation and FAMs probably are not a good idea in the first place. In particular, the allocator itself typically does not allocate with single-byte granularity.

  3. Substituting an offsetof expression for a sizeof expression is hardly obfuscatory. It might even be clearer, since then the name of the FAM actually appears in the size computation. Your particular example code is somewhat overcomplicated, however, by the unnecessary measure employed to make the allocation size a multiple of the structure's alignment requirement.

Although the size of a structure type that has a FAM does not include the size of the FAM itself, it does include any padding between the penultimate member and the FAM, and possibly more:

In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.

(C17 6.7.2.1/18)

Thus, a pretty tight upper bound on the space needed for a structure of type struct S that has a flexible array member fam of type fam_t can be calculated as:

size_t bytes_needed = sizeof(struct S)   num_fam_elements * sizeof(fam_t);

That is in fact idiomatic, but if you prefer

size_t bytes_needed = offsetof(struct S, fam)   num_fam_elements * sizeof(fam_t);
if (bytes_needed < sizeof(struct S)) {
    bytes_needed = sizeof(struct S);
}

for the absolute minimum then I see nothing objectionable about that form.

Is there something I'm missing why struct V2 must be padded?

Undoubtedly so, as you observe your implementation to pad it, but the implementation does not owe you an explanation.

Nevertheless, your implementation most likely applies a combination of rules such as these:

  • the alignment requirement for a structure type is the same as the strictest alignment requirement of any of its members, and
  • the size of a structure type is an an integer multiple of its alignment requirement.

Neither of those is a rule of the language itself, but they are fairly common in practice. In particular, they are part of the System V x86_64 ABI, and undoubtedly of other ABIs, too. Note that although those rules do serve the purpose of ensuring that structure members can be properly aligned inside an array of structures, they make no exception for structure types that are not allowed to be the element type of an array.

  • Related