Home > database >  How does a compiler store information about an array's size?
How does a compiler store information about an array's size?

Time:05-23

Recently I read on IsoCpp about how compiler known size of array created with new. The FAQ describes two ways of implementation, but so basically and without any internal information. I tried to find an implementation of these mechanisms in STL sources from Microsoft and GCC, but as I see, both of them just call the malloc internally. I tried to go deeper and found an implementation of the malloc function in GCC, but I couldn't figure out where the magic happens. Is it possible to find how this works, or it implemented in system runtime libraries?

CodePudding user response:

Here is where the compiler stores the size in the source code for GCC: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/init.c#L3319-L3325

And the equivalent place in the source code for Clang: https://github.com/llvm/llvm-project/blob/c11051a4001c7f89e8655f1776a75110a562a45e/clang/lib/CodeGen/ItaniumCXXABI.cpp#L2183-L2185

What the compilers do is store a "cookie" which is the number of elements allocated (the N in new T[N]) immediately before the pointer that new T[N] returns. This in turn means that a few extra bytes have to be allocated in the call to operator new[]. The compiler generates code to do this at runtime.

operator new[](std::size_t x) itself does no work: It simply allocates x bytes. The compiler makes new T[N] call operator new[](sizeof(T) * N cookie_size).

The compiler does not "know" the size (it's a run-time value), but it knows how to generate code to retrieve the size on a subsequent delete[] p.

CodePudding user response:

At least for GCC targeting x86_64, it is possible to investigate this question by looking at the assembly GCC generates for this simple program:

#include <iostream>

struct Foo
{
  int x, y;
  ~Foo() { std::cout << "Delete foo " << this << std::endl; }
};

Foo * create()
{
  return new Foo[8];
}

void destroy(Foo * p)
{
  delete[] p;
}

int main()
{
  destroy(create());
}

Using Compiler Explorer, we see this code generated for the create function:

create():
        sub     rsp, 8
        mov     edi, 72
        call    operator new[](unsigned long)
        mov     QWORD PTR [rax], 8
        add     rax, 8
        add     rsp, 8
        ret

It looks to me like the compiler is calling operator new[] to allocate 72 bytes of memory, which is 8 bytes more than is needed for the storage of the objects (8 * 8 = 64). Then it is storing the object count (8) at the beginning of this allocation, and adding 8 bytes to the pointer before returning it, so the pointer points to the first object.

This is one of the methods what was listed in the document you linked to:

Over-allocate the array and put n just to the left of the first Fred object.

I searched a little bit in the source code of libstdc to see if this was implmented by the standard library or the compiler, and I think it's actually implemented by the compiler itself, though I could be wrong.

  • Related