How does a compiler store information about an array's size?-CodePudding

Recently I read on IsoCpp about how compiler known size of array created with new. The FAQ describes two ways of implementation, but so basically and without any internal information. I tried to find an implementation of these mechanisms in STL sources from Microsoft and GCC, but as I see, both of them just call the malloc internally. I tried to go deeper and found an implementation of the malloc function in GCC, but I couldn't figure out where the magic happens. Is it possible to find how this works, or it implemented in system runtime libraries?

CodePudding user response：

Here is where the compiler stores the size in the source code for GCC: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/init.c#L3319-L3325

And the equivalent place in the source code for Clang: https://github.com/llvm/llvm-project/blob/c11051a4001c7f89e8655f1776a75110a562a45e/clang/lib/CodeGen/ItaniumCXXABI.cpp#L2183-L2185

What the compilers do is store a "cookie" which is the number of elements allocated (the N in new T[N]) immediately before the pointer that new T[N] returns. This in turn means that a few extra bytes have to be allocated in the call to operator new[]. The compiler generates code to do this at runtime.

operator new[](std::size_t x) itself does no work: It simply allocates x bytes. The compiler makes new T[N] call operator new[](sizeof(T) * N cookie_size).

The compiler does not "know" the size (it's a run-time value), but it knows how to generate code to retrieve the size on a subsequent delete[] p.

CodePudding user response：

At least for GCC targeting x86_64, it is possible to investigate this question by looking at the assembly GCC generates for this simple program:

#include <iostream>

struct Foo
{
  int x, y;
  ~Foo() { std::cout << "Delete foo " << this << std::endl; }
};

Foo * create()
{
  return new Foo[8];
}

void destroy(Foo * p)
{
  delete[] p;
}

int main()
{
  destroy(create());
}

Using Compiler Explorer, we see this code generated for the create function:

create():
        sub     rsp, 8
        mov     edi, 72
        call    operator new[](unsigned long)
        mov     QWORD PTR [rax], 8
        add     rax, 8
        add     rsp, 8
        ret

It looks to me like the compiler is calling operator new[] to allocate 72 bytes of memory, which is 8 bytes more than is needed for the storage of the objects (8 * 8 = 64). Then it is storing the object count (8) at the beginning of this allocation, and adding 8 bytes to the pointer before returning it, so the pointer points to the first object.

This is one of the methods what was listed in the document you linked to:

Over-allocate the array and put n just to the left of the first Fred object.

I searched a little bit in the source code of libstdc to see if this was implmented by the standard library or the compiler, and I think it's actually implemented by the compiler itself, though I could be wrong.