How to ensure certain struct layout across compilations?-CodePudding

The standard for C says nothing about packing and padding of structs, because it is implementation defined. If it is implementation defined, then for example, why it is safe to pass struct to dll, if this dll could have been compiled with different compiler, which could have different method for struct padding? Is structpadding method enforced by the OS's ABI (For example padding will be the same on all Windows platforms)? Or is there standard method for padding when compiling for PC (x64 or x86_64 systems) that is used in every modern compiler? If there is not anything that can guarantee layout of variables, then is it safe to assume that each basic type in C (char, all numeric variables and pointers) must be aligned to address that is multiple of it's size , and because of it, padding inside struct can be done by hand without performance problems or UB?
From what I checked, g compiles structs in such way, that it inserts minimum amount of padding, just to ensure alignment of next variable.
Example:

struct foo
{
  char a;
  // char _padding1[3]; <- inserted by compiler
  uint32_t b;
};

There are 3 bytes of padding after a because it is minimum amount that will give us suitably aligned address for b. Can we take for granted that compilers will do this that way? Or as I asked earlier, can we force this kind of padding by hand without UB or performance issues?

By hand I mean:

#pragma pack(1)
struct foo
{
  char a;
  char _padding1[3]; //<- manually adding padding bytes
  uint32_t b;
};
#pragma pack()

Just to be clear: I am asking about behaviour of compilers only on PC platforms : Windows, Linux distros and maybe MacOS.

Sorry if my question is in category "you dig into this too much", I just couldn't find satisfying answer on the internet, because some people say that it is not guaranteed, others that compiling with different compilers on systems that use the same ABI it is guaranteed that the same struct will have the same layout, and others show how to reduce struct padding assuming that compilers pack structs the way that I described above (it is with minimum required padding to align variables).

CodePudding user response：

If it is implementation defined, then for example, why it is safe to pass struct to dll

Because the dll and the caller follow the same Application binary interface (ABI) that defines the layout.

By the way, dll are a language extension and not part of standard C .

if this dll could have been compiled with different compiler, which could have different method for struct padding?

If the library and the dependent don't follow an intercompatible ABI, then they cannot work together.

Is structpadding method enforced by the OS's ABI

Yes, class layout (structs are classes) is defined by the ABI.

For example padding will be the same on all Windows platforms

Not quite, since Windows on ARM has a different ABI for example. But within the same CPU architecture, the layout would be the same in Windows.

Or is there standard method for padding when compiling for PC (x64 or x86_64 systems) that is used in every modern compiler?

No, there is no universal class layout followed by OS, even within x86_64 architecture.

From what I checked, g compiles structs in such way, that it inserts minimum amount of padding, just to ensure alignment of next variable.

All objects in C must be aligned as per the alignment requirement of the type of the object. This guarantee isn't compiler specific. However alignment requirements of types - and even the sizes of types - vary across different ABIs.

Bonus info: Compilers have language extensions that remove such guarantee.

There are 3 bytes of padding after a because it is minimum amount that will give us suitably aligned address for b. Can we take for granted that compilers will do this that way?

In general no. On some systems, alignof(std::uint32_t) == 1 in which case there wouldn't be need for any padding.

Within a single ABI, you can take for granted that the layout is the same, but across multiple systems - which might not follow the same ABI - you cannot take it for granted.

When dealing with binary layout across systems (for example, when reading from a file or network), the standard compliant way is to treat the data as an array of bytes¹, and to copy each sequence of bytes² from pre-determined offsets onto fixed width³ fundamental objects (not classes whose layout may differ). In practice, you don't need to care about sign representation although that used to be a problem historically.

If the optimiser does its job, there ideally shouldn't be any performance penalty if the layout of input data matches the native layout. In case it doesn't match, then there may be a cost (compared to a matching layout) that cannot be optimised away.

¹ This isn't sufficient when byte size differs across systems, but you don't need to worry about that since you care about x86_64 only.

² In order to support systems with varying byte endianness, you must interpret the bytes in order of their significance rather than memory order, but you don't need to worry about that since you care about x86_64 only.

³ I.e. not int, short, long etc., but rather std::int32_t etc.

CodePudding user response：

The C and C Standards were written to describe existing languages. In situations where 99 % of implementations would do things a certain way, and it was obvious that implementatins should do things that way absent a compelling reason for doing otherwise, the Standards would generally leave open the possibility of implementations doing something unusual. Consider, for example, given something like:

struct foo {int i; char a,b[4],c,d,e;}; // Assume sizeof (int) is 4
struct foo myFoo;

On most platforms, making bar be a three-word type which contains all of the individual bytes packed together may be more efficient than doing anything else. On the other hand, on a platform that uses word-addressed storages but includes instructions to load or store bytes at a specified byte offset from a specified word address, word-aligning the start of b may allow a construct like myfoo.b[i] to be processed by directly using the value of i as an offset onto the word-aligned address of myFoo.b. The Standards were designed people designing compilers for such platforms to weigh the pros and cons of following normal practice versus deviating from it to better fit the target architecture.

Machines that use word addresses but allow byte-based loads and stores are of course exceptionally rare, and very little code that isn't deliberately written from such machines for which compatibility with such them would offer any added value whatsoever. The Committee wasn't willing to say that such machines should be viewed as archaic and not worth supporting, but that doesn't mean they didn't expect and intend that programs written for commonplace implementations could exploit aspects of behaivor that were shared by all commplace implementations, even if not by some obscure ones.