Home > other >  C 17 copy elision of heap allocated objects?
C 17 copy elision of heap allocated objects?

Time:10-30

I'm using C 17 and I have a rather large (two dimensional) array of numbers I'm trying to initialize at the namespace scope. This array is intended to be a pre-computed lookup table that's going to be used in many parts of my code. It's definition looks something like this:

using table_type = std::array<std::array<uint64_t, SOME_BIG_NUMBER>, SOME_OTHER_BIG_NUMBER>;

table_type MyTable = ...

Where the total size of the table is > 200,000 or so. Now, all of the values in the table could be known at compile time, so initially I went ahead and did something like this:

Attempt 1

// Header.h

constexpr table_type MyTable = []()
{
    table_type table{};

    // code to initialize table...

    return table;
}();

My initial fears were realized when MSVC refused to compile this code. The table is simply too large to be computed at compile time. I could have fiddled with the settings and increased the maximum allowed steps but I didn't really feel like getting into that.

Attempt 2

// Header.h

const extern table_type MyTable;
// Implementation.cpp

const table_type MyTable = []()
{
    table_type table{};
    
    // code to initialize table...
    
    return table;
}();

I had a feeling this also wouldn't work when I was coding it up and I was right. MSVC warns that I'm using over 2MB of stack memory in the lambda function, and though it compiles the executable immediately crashes upon startup due to the stack being blown up.

Attempt 3

// Header.h

// MyTable is no longer const
extern table_type MyTable;
// Implementation.cpp

table_type MyTable{};

int init_table()
{
    // code to initialize table...
    // we initialize it by directly writing to it, e.g
    // table[0][0] = 5; 
    // table[0][1] = 4;
    // etc.

    return 0;
}

const auto _Unused = init_table();

This works. MSVS, GCC, and Clang will compile this code without complaint and the resulting executable runs as desired. Nevertheless, I found this method unsatisfying for several reasons. The first obvious issue is that I had to make my table non-const which means that it can potentially be mutated from anywhere else in the code. I could just be careful not to do that, but if someone else ever comes along and uses my code (it's a static library) they might not be so careful. I'd like to avoid giving myself and others unneeded opportunities to shoot ourselves in the foot.

To top it all off, the initialization code is ugly. Rather than being able to use an immediately evaluated lambda I instead have to declare a free function and then call it in another location. Furthermore, if I want this to happen automatically when the program launches I need to have the init_table function return something and then store that result in a variable somewhere. C doesn't seem to allow for something like init_table to be called at namespace scope unless the result is being stored somewhere. So now I have an initialization function that returns a useless value and I am required to save that value in a namespace scoped variable. Ugly.

Attempt 4

// Header.h

const extern table_type MyTable;
// Implementation.cpp

const table_type MyTable = []() -> table_type
{
    auto table_p = std::make_unique<table_type>();
    
    // code to initialize table_p...
    
    return *table_p;
}();

The idea is to avoid blowing up the stack by allocating a temporary table on the heap. That table is initialized and then copied into MyTable. MSVC, GCC, and Clang accept this code with no warnings and the resulting executables run fine.

My question

The problem is I didn't expect this to work and I can't quite wrap my head around why it works. The initial table is allocated on the heap without issue, but I don't completely understand what's happening when it's returned from the lambda. I added the -> table_type explicit return type to the lambda to make sure that it didn't deduce the return type as table_type& as that would result in a dangling reference. But since it's returning the table by copy wouldn't a temporary r-value need to be made on the call stack when the lambda returns? And that should result in the same crash as attempt #2.

I'm aware of RVO in C and that it was enhanced further with C 17. But in this case I'm attempting to return an object that's allocated on the heap and managed by a unique_ptr. I don't see how RVO could apply here because once the return statement is hit the destructor of the unique_ptr will free the memory used to store the table so there's no way that MyTable could then be initialized by simply copying the contents of the memory pointed to by table_p.

I understand that if I directly returned a stack allocated object by value the compiler can optimize away the call to the destructor of that object and effectively memcpy its contents into the new value it's being stored in. But in this case the object being returned by value is on the heap. It appears to me that what the compiler is doing is copying the contents of the memory pointed to by table_p into the memory used to store MyTable, and it is doing this before the destructor of table_p is run. Since destructor calls (as far as I'm aware) are considered part of the function body in which they are called, this would mean that MyTable is being fully intialized before the function that produces the value that initializes it actually exits. This sounds very strange to me.

I've been puzzling about this all day and I just can't figure out what's going on here. I also can't seem to find much online that's related to this specific scenario. My fear is that I'm relying on undefined/implementation defined behavior that the three major compilers just happen to work nicely with. Could another conforming compiler come along and produce an executable that blows up the stack here?

CodePudding user response:

MyTable is declared at namespace scope, so it's not stored on the stack. It's stored in the "static area" or whatever you call it.

The object that table_p points to is stored on the heap.

So what happens when the lambda returns? First, the object that is stored on the heap is copied directly to the object that is stored in the static area (without having to go through a large temporary stack object). Then, table_p is destroyed (and the heap object with it).

The C 17 "guaranteed copy elision" feature ensures that when the lambda returns, a temporary object is not created unless it needs to be. The call expression is a prvalue, which means that it does not designate an object, but is a "recipe" that describes how to initialize an object. The compiler determines which object is the target of the prvalue (i.e., the memory location on which the "recipe" will be run in order to create and initialize an object). In this case, the target is MyTable itself, not a temporary. The return statement directly initializes that target object, not a temporary.

  • Related