Home > Software design >  This union statement fails with a bus error
This union statement fails with a bus error

Time:11-17

This generates a bus error:

union { char a[10];
        int i;
      } u;
int *p = (int *) &u.a[1]);
*p = 17;

Why would this generate an error? I mean, chars can hold the number 17.

CodePudding user response:

u.a[1] is not properly aligned for an int.

Commonly, the memory that accesses hardware gets many bits at once, such as 32 or 64 bits (four or eight eight-bit bytes). Using 32 as an example, when exchanging data between memory and the processor, bytes will be moved in four-byte groups. So the processor would load bytes 1000, 1001, 1002, and 1003 from memory, for example.

To accommodate this, the processor is designed so that four-byte integers are always located at addresses that are multiples of four. When the program wants to load an integer from address 1000, the processor gets those from memory in a single transaction that gets bytes 1000, 1001, 1002, and 1003, which the processor then delivers to a register.

To get a single byte, the processor still has to get four bytes from memory, but it may put just the single byte requested in a register.

If the union u is at address 1000, then u.i starts at address 1000, and u.a starts at 1000, with u.a[0] at 1000, u.a[1] at 1001, u.a[2] at 1002, and u.a[3] at 1003. When you set p to &u.a[1], it points to address 1001. When you use *p, the program attempts to load an int from address 1001. Then the processor generates an exception, because 1001 is not a proper address for an int.

These are the essential details. There are variations in practice. Some processors may successfully load an int from 1001, but they will do it more slowly than an aligned load, because the processor has to get the four-byte word from memory at address 1000 and the four-byte word at address 1004 and then take three bytes from the first word and one byte from the second and put them together. On some systems, the processor still generates an exception, but the operating system handles it by doing the two loads and the merge instead of by delivering a signal to the process.

A rule in the C standard covering this is in C 2018 6.3.2.3 7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined…

That actually says the behavior is undefined even if the program just performs the conversion, (int *) &u.a[1], but an exception is often observed only when an attempt is made to use the resulting pointer to load from or store to memory.

  • Related