Home > Blockchain >  Linking behavior and padding bytes
Linking behavior and padding bytes

Time:12-08

I have two source files I am linking:

src1.c:

#include <stdio.h>

typedef struct my_struct {
    char a;
    short b;
    int c;
} my_struct;

my_struct x;

void add1();

int main()
{
    x.a = 0;
    x.b = 0;
    x.c = 0;
    
    add1();
    
    printf("%x, %x, %x\n", x.a, x.b, x.c);
}

and src2.c:

#include <stdio.h>

typedef struct my_struct {
    int c;
    short b;
    char a;
} my_struct;

extern my_struct x;

void add1(){
    x.a  = 1;
    x.b  = 1;
    x.c  = 1;
}

The output is "1, 0, 10001" due to the nature of the type definitions and alignment. However, this relies on the second byte of the struct to be 0x00 (which is a padding byte in the struct for src1.c).

Is this guaranteed behavior? Are padding bytes typically initialized to 0? Are there cases when they wouldn't be?

CodePudding user response:

The two struct types in each source file are not compatible with each other because the members aren't declared in the same order.

This is spelled out in section 6.2.7p1 of the C standard:

Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.6 for declarators. Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values

This means that my_struct x; and extern my_struct x; are not compatible, and declaring an identifier multiple times with different types triggerer undefined behavior, which loosely speaking means there are no guarantees at all what your program will do.

Unrelated to this, as far as padding bytes go, structures declared at file scope will have padding bytes initialized to 0 if the struct is not explicitly initialized.

The proper way to examine padding bytes in a struct is to use an unsigned char * to point to the start of the struct and iterate through the individual bytes,

CodePudding user response:

Is this guaranteed behavior?

No. C 2018 6.5 7 specifies rules for when the behavior is defined if you try to access an object defined as one type using a different type. Accessing one type of structure, say A, with a different structure type, say B, has defined behavior only if:

  1. B is compatible with A,
  2. B is a qualified version of a type compatible with A, or
  3. one of the members of B (possibly nested) is of type A or a qualified version of A.

Option 3 clearly does not apply. Regarding 1 and 2, the rules for compatibility of structures in different translation units are in C 2018 6.2.7 1, and they require there be a one-to-one correspondence between structure members in their names, types, and alignments. The structures do not have that, so they do not satisfy the requirements.

This means the behavior of accessing the my_struct x object defined in one translation unit using the my_struct type defined in the other translation unit is not defined by the C standard. In other words, the C standard does not guarantee the behavior.

In C implementations without any cross-unit optimization or other information transfer outside of the ordinary linkage, we can reason that using the my_struct in src2.c must access the bytes of the my_struct defined in src1.c because there is no other way to implement the behavior that the C standard does require.1 This would generally be bad programming practice, as, even if it is necessary to reinterpret the bytes of one type of my_struct as another type, there are ways to do that that are defined by the C standard.

Are padding bytes typically initialized to 0?

Yes, for objects with static or thread storage duration, C 2018 6.7.9 says “… if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;…”

Are there cases when they wouldn't be?

There are cases where initialization will not set the padding bits to zero. If a structure object is defined with automatic storage duration (the default inside a function definition) or is dynamically allocated, it will not be initialized at all. If you explicitly initialize a structure object, the C standard does not specify what the padding bytes will be set to; they will have unspecified values.

There are also cases where the padding bytes will not remain zero. Whenever you store a value in a structure or its members, the padding bytes may take on unspecified values, per C 2018 6.2.6.1 6. For example, in main, you have x.a = 0;. Since a is a char and is presumably followed by one byte of padding (due to the short that follows it), the compiler is allowed to implement this by clearing eight bits of a processor register and then issuing a 16-bit store to the structure. This will set the byte for the a member to zero and will set the padding byte to whatever happened to be in the other bits of the register.

Then reading that byte through the my_struct type in the other translation unit will get those other bits.

Footnote

1 This speaks to the particular code in the question. In other circumstances, there can be further complications. For example, suppose translation unit U defines an object X with type A and then translation unit V both attempts to access X (directly by name) with an incompatible type B and to access X through a pointer of type “pointer to A”. The compiler in V is entitled to assume the two access refer to different objects, so it has no responsibility to coordinate them. For example, if V writes to X through the pointer and then reads X by name, the compiler has no obligation to actually read the bytes of X from memory; it could use a previously read value it is holding in a register, since it has no reason to believe the write through the pointer changed X.

CodePudding user response:

If you use the same compiler for both units, the padding rules should be the same, unless they change due to optimizations - try disabling them (-O0 switch for GNU C Compiler).

  • Related