SizeOf set of enums, 32 bit vs 64 bit and memory alignment-CodePudding

Given an enum TEnum with 33 items (Ceil (33 / 8) = 5 bytes), and a TEnumSet = Set of TEnum, the SizeOf (TEnumSet) gives a different result when running in 32 vs. 64-bit Windows:

32 bit: 5 bytes as per the calculation above
64 bit: 8 bytes

When increasing the number of elements in the enum the size will vary to, say, 6 bytes in 32-bit, while in 64-bit, it remains 8 bytes. As if the memory alignment in 64-bit is rounding up the size to the nearest multiple of XX? (not 8, smaller enums do yield a set size of 2, or 4). And a power of 2 is most likely not the case either?

In any case: this is causing a problem while reading a file to a packed record written as a buffer from a 32 bit program. Trying to read the same file back into a 64 bit program, since the packed record sizes don't match (the record contains this mismatching set, among other things), reading fails.

I tried looking in the compiler options for some options related to memory alignment: there is an option for record memory alignment but it does not impact sets, and is already the same in both configurations.

Any explanation on why the set is taking more memory in 64-bit, and any potential solutions to be able to read the file into my packed record on a 64-bit platform?

Note that I have no control over the writing of the file: it is written using a 32-bit program to which I don't have access (so altering the writing is not an option).

CodePudding user response：

Here is my test program:

{$APPTYPE CONSOLE}

type
  TEnumSet16 = set of 0..16-1;
  TEnumSet17 = set of 0..17-1;
  TEnumSet24 = set of 0..24-1;
  TEnumSet25 = set of 0..25-1;
  TEnumSet32 = set of 0..32-1;
  TEnumSet33 = set of 0..33-1;
  TEnumSet64 = set of 0..64-1;
  TEnumSet65 = set of 0..65-1;

begin
  Writeln(16, ':', SizeOf(TEnumSet16));
  Writeln(17, ':', SizeOf(TEnumSet17));
  Writeln(24, ':', SizeOf(TEnumSet24));
  Writeln(25, ':', SizeOf(TEnumSet25));
  Writeln(32, ':', SizeOf(TEnumSet32));
  Writeln(33, ':', SizeOf(TEnumSet33));
  Writeln(64, ':', SizeOf(TEnumSet64));
  Writeln(65, ':', SizeOf(TEnumSet65));
end.

And the output (I am using XE7 but I expect that it is the same in all versions):

32 bit	64 bit
16:2	16:2
17:4	17:4
24:4	24:4
25:4	25:4
32:4	32:4
33:5	33:8
64:8	64:8
65:9	65:9

Leaving aside the 32 vs 64 but difference, notice that the 17 and 24 bit cases could theoretically fit in a 3 byte type, they are stored in a 4 byte type.

Why does the compiler choose to use a 4 byte type rather than a 3 byte type? It can only be that this allows for more efficient code. Operating on data that can be mapped directly onto CPU registers is more efficient than picking at the data byte by byte, or in this case by accessing two bytes in one operation, and then the third byte in another.

This then points to why anything between 33 and 64 bits is mapped to an 8 byte type under the 64 bit compiler. The 64 bit compiler has 64 bit registers, and the 32 bit compiler does not.

As for how to solve your problem, then I can see two main approaches:

In your 64 bit program, read and write the record field by field. For the fields which are afflicted by this 32 vs 64 bit issue, you will have to introduce special code to read and write just the first 5 bytes of the field.
Change your record definition to replace the set with array [0..4] of Byte, and then introduce a property that maps the set type onto that 5 byte array.

CodePudding user response：

Working with the memory size of a set leads to process errors sooner or later. This becomes particularly clear when working with subtypes.

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils;

type
  TBoolSet=set of boolean;
  TByteSet=set of byte;
  TSubEnum1=5..10;
  TSubSet1=set of TSubEnum1;
  TSubEnum2=205..210;
  TSubSet2=set of TSubEnum2;
var
  i, j: integer;
  a, a1: TByteSet;
  b, b1: TSubSet1;
begin
  try
    writeln('SizeOf(TBoolSet): ', SizeOf(TBoolSet)); //1
    writeln('SizeOf(TByteSet): ', SizeOf(TByteSet)); //32
    writeln('SizeOf(TSubSet1): ', SizeOf(TSubSet1)); //2
    writeln('SizeOf(TSubSet2): ', SizeOf(TSubSet2)); //2
  
   //Assignments are allowed. 
   a := [6, 9];
   b := [6, 9];
   writeln('a = b ?: ', BoolToStr(a = b, true)); //true

   a1 := a   b; //OK
   b1 := a   b; //OL
   a  := [7, 200];
   b1 := a   b; //??? no exception, Value 200 was lost. !
   i  := 0;
   for j in b1 do
     i := succ(i);
   writeln('b1 Count: ', i);

  readln(i);
 except
    on E: Exception do
     Writeln(E.ClassName, ': ', E.Message);
 end;
end.