Moving functors into std::function's while avoiding copies-CodePudding

I am trying to move a functor into a lambda inside an object, like this:

#include <functional>
#include <iostream>

#include "boost/stacktrace.hpp"

#define fwd(o) std::forward<decltype(o)>(o)

struct CopyCounter {
  CopyCounter() noexcept = default;
  CopyCounter(const CopyCounter &) noexcept {
    std::cout << "Copied at " << boost::stacktrace::stacktrace() << std::endl; 
    counter  ;
  }
  CopyCounter(CopyCounter &&) noexcept = default;

  CopyCounter &operator=(CopyCounter &&) noexcept = default;
  CopyCounter &operator=(const CopyCounter &) noexcept {
    std::cout << "Copied at " << boost::stacktrace::stacktrace() << std::endl;
    counter  ;
    return *this;
  }

  inline static size_t counter = 0;
};

struct Argument : CopyCounter {};

struct Functor : CopyCounter {
  int operator()(Argument) { return 42; }
};

class Invoker {
  std::function<void()> invoke_;
  int result_;

public:
  template <class Functor, class... Args>
  Invoker(Functor&& f, Args&&... args) {
    invoke_ = [this, f = fwd(f), ...args = fwd(args)]() mutable { 
      result_ = f(fwd(args)...);
    };
  }
};

int main() {
  Functor f;
  Argument a;
  auto i = Invoker(std::move(f), std::move(a));
  assert(CopyCounter::counter == 0);
  return 0;
}

Somewhat surprisingly, the last assert fails on libc , but not libstdc . The stacktrace hints at the two copies that are performed:

Copied at  0# CopyCounter at /usr/include/boost/stacktrace/stacktrace.hpp:?
 1# 0x00000000004C812E at ./src/csc_cpp/move_functors.cpp:38
 2# std::__1::__function::__value_func<void ()>::swap(std::__1::__function::__value_func<void ()>&) at /usr/lib/llvm-10/bin/../include/c  /v1/functional:?
 3# ~__value_func at /usr/lib/llvm-10/bin/../include/c  /v1/functional:1825
 4# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
 5# _start in ./bin/./src/csc_cpp/move_functors

Copied at  0# CopyCounter at /usr/include/boost/stacktrace/stacktrace.hpp:?
 1# std::__1::__function::__value_func<void ()>::swap(std::__1::__function::__value_func<void ()>&) at /usr/lib/llvm-10/bin/../include/c  /v1/functional:?
 2# ~__value_func at /usr/lib/llvm-10/bin/../include/c  /v1/functional:1825
 3# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
 4# _start in ./bin/./src/csc_cpp/move_functors

It seems like inside the library the functor and the argument get copied in swap, during the move-assignment of invoke_. There are two questions:

Why is this the desired behaviour and what can be the motivation behind this design solution?
What is a good way to update the code to reach the same semantics as in libstdc ?

CodePudding user response：

A partial answer:

If you initialize invoke_ using an initialization list:

class Invoker {
  std::function<void()> invoke_;
  int result_;

public:
  template <class Functor, class... Args>
  Invoker(Functor&& f, Args&&... args) :
    invoke_([this, f = fwd(f), ...args = fwd(args)]() mutable { 
      result_ = f(fwd(args)...);
    }) { }
};

the assertion does not fail. So, I'm guessing the swap is between (fields of) the default-initialized invoke_ and the functor you're assigning to invoke_.

CodePudding user response：

libstdc and libc use different small-object optimization strategies.

In libc , the callable is stored locally if the following hold:

the callable fits into the local buffer;
the callable is nothrow copy constructible; and
the allocator is nothrow copy constructible.

(Note: allocator support was removed from the standard in C 17. However, practically speaking, it can't be removed from standard library implementations, at least not for a very long time, without breaking existing code.)

In libstdc , the callable is stored locally if the following hold:

the callable fits into the local buffer;
the callable is of trivially copyable type.

(I am glossing over the issue of alignment, as it's not particularly relevant here.)

libstdc 's choice of when to use the small-object optimization implies that wherever it does apply, the callable can be copied simply by copying the array that provides storage for it. But the Functor and Argument types in your code are not trivially copyable, so the lambda closure type is not, either. Storage for the lambda is allocated out-of-line and the owned callable is move-constructed from the lambda. It never needs to be copied or moved thereafter.

On the other hand, libc stores your callable in the small object buffer. During the assignment to invoke_, it must swap a std::function containing the closure object in the small object buffer with a default-constructed std::function. That means the callable must be moved from one std::function instance's small object buffer to the other instance's small object buffer. libc accomplishes this move by copying the callable and then destroying the source of the copy.

Why does libc not use the move constructor? Apparently, it's a bug. See LLVM bug 33125, which is about the move constructor and not the swap function, but the same principles apply. The move constructor of std::function uses the copy constructor, not the move constructor, when the source has stored the callable in the small object buffer. The reason for this is that the type erasure interface that is used by libc to know how to handle the callable while forgetting its type is a class called __base that has virtual functions for copying the callable and virtual functions for destroying the callable, but doesn't have any for moving it---and apparently, adding move support would break ABI, so it can't be done until further notice.

Note that this is only a bug in the sense that it has suboptimal performance. It is not a bug in the sense of failing to conform to the standard. Both libstdc and libc have valid implementation strategies. The standard does not say how many times the callable is allowed to be copied.

The best thing to do would be to update your code so that its correctness doesn't depend on the number of times the callable is copied. If you really can't bear the cost of the copy but need to build with libc , there are some other strategies such as:

padding the callable (by capturing a dummy object of sufficient size) so that it doesn't fit into the small object buffer;
making the callable's copy constructor noexcept(false) (by capturing an object with a noexcept(false) copy constructor) so that it won't go into the small object bufer; or
storing in the std::function a reference-semantic wrapper class that owns the actual callable a std::shared_ptr to the actual callable type and an operator() that forwards to the actual callable object.