Home > Net >  Any C macro that can export all member variables of a struct for pybind11
Any C macro that can export all member variables of a struct for pybind11

Time:12-09

I have a simple structure like:

struct Config {
  bool option1;
  bool option2;
  int arg1;
};

Using pybind11, I have to export the member variables like:

py::class_<Config>(m, "Config")
    .def_readwrite("option1", &Config::option1)
    .def_readwrite("option2", &Config::option2)
    .def_readwrite("arg1", &Config::arg1);

It is ok to write the above when these structs are a few. But it becomes tedious when I have a large number of simple structures.

Is there a convenience macro that I can write like:

PYBIND_EXPORT_STRUCT(Config1);
PYBIND_EXPORT_STRUCT(Config2);
...

and each scans and exports all the given struct's member variables?

Will it be helpful if I already write the structs in this form:

struct Config {
    ADD_PROPERTY(bool, option1);
    ADD_PROPERTY(bool, option2);
    ADD_PROPERTY(int, arg1);
};

My question involves two parts:

  1. To reflect a member variable back to its name string.
  2. To iterate through struct members.

I am aware of introspection to solve the first part, using typeid(arg1).name() to retrieve the name string.

For the second part, C does not directly support. However, I am trying to figure it out through some answers here.

The rest of the question is how to fuse the above two parts to get a working implementation for my imagined PYBIND_EXPORT_STRUCT() function.

That said, I don't mind expressing my structs in a totally different representation (like using macros, or as tuples). Any will do as long as I don't have to enumerate my struct members again when exporting them with pybind11, and I can still use the variables like config1.option1=true in C code.

CodePudding user response:

For number 2 you can try to check out my pod_reflection library:

// main.cpp

struct Config {
  bool option1;
  bool option2;
  int arg1;
};

#include <pod_reflection/pod_reflection.h>
#include <iostream>

int main()
{
  std::cout << "Config size: " << eld::pod_size<Config>() << std::endl;
  std::cout << std::boolalpha;
  Config conf{true, false, 815};
  eld::for_each(conf, [](const auto& i){ std::cout << i << std::endl; });
  return 0;
}

CMakeLists.txt:

cmake_minimum_required(VERSION 3.7.2 FATAL_ERROR)

project(pod_example)

add_subdirectory(pod_reflection)

add_executable(main main.cpp)
target_link_libraries(main eld::pod_reflection)

It can traverse through pod's elements of fundamental types. A set of types can also be expanded with user-defined types through template<typename ... ArgsT> using extend_feed:

using my_feed = extend_feed<std::string, foo>;
eld::for_each<my_feed>(pod, callableVisitor);

You could possibly use eld::deduced& get<I, TupleFeed>(POD& pod) to populate py::class_<Config>. However, since the library can't possibly know the names of pod members, you would have to figure out a way to deduce them from I. Without a proper compile-time reflection it would be almost impossible to automate. Be advised, get uses reinterpret_cast to get a pointer to a member via an offset.

CodePudding user response:

1. How not to solve the problem

Neither of the approaches is viable or practical.

I am aware of introspection to solve the first part, using typeid(arg1).name() to retrieve the name string.

This is incorrect. C has RTTI, run-time type information, but it's very far from “reflection” in the sense C#, Java or Python have. In particular, the member function std::type_info::name() “Returns an implementation defined null-terminated character string containing the name of the type. No guarantees are given; in particular, the returned string can be identical for several types and change between invocations of the same program.” [highlights are mine -kkm] In fact, this program

#include <iostream>
#include <typeinfo>
struct Config { int option; };
int main() { std::cout << typeid(&Config::option).name() << "\n"; }

prints, if compiled with GCC 11 on Linux x64,

M6Configi

which is fully standard-compliant. Here goes down the drain your part #1. A type does not contain a member name, and it's called Runtime Type Informaton and not Runtime Name Information for a reason. You can even decode the printed string: M = pointer to member, 6 = next 6 characters name the struct type, Config = obvious, i = int. A pointer to member of type Config, of type int itself. But another compiler will encode (“mangle”, how it's called) the type differently.

Regarding part #2, take this CppCon video presentation (from an answer you are liking to) for what it really is: it's a demonstration that C 14 metaprogramming is powerful enough to extract information about a POD type. As you see, the presenter declares two functions per each each type that you can possibly encounter on a member (int, volatile int, const int, const volatile int, short, ...). Let's stop here. All these types are different. In fact, when I changed the declaration of the lone structure member to volatile int option;, the program output a different typeid name: M6ConfigVi.

This is a demonstration of what the machinery is capable of, not what it should be used for. In practice, this is a good test for a compiler. I used to get compiler crashes with far more modest metaprogramming constructs. Besides, you'll probably won't like compilation time of all this kaboodle. Don't be surprised to sit and wait for 10 minutes until the compiler is done, in one of four ways: crash, internal error report, successful generation of incorrect code or, fingers crossed, successful generation of correct code. Also, you need deep, and I mean really deep understanding of metaprogramming, how compiler selects different template overloads, what SFINAE is, and so on. Simply speaking, just don't.

2. How to solve the problem

There is a very traditional way to do what you are trying to do, relying on the plain old C preprocessor macros. The core idea is: you write the definition of your structs as function-like macros in a separate file, which does not contain any macro definitions. A second file, which becomes your included header, defines these macros to expand into normal C constructs. The third file, that creates Python bindings, also includes this file, but defines the macros differently, so that they expand into pybind syntax. Let's do just that.

This is the struct definition file, structs.absdef. I would not even give it the traditional .h extension, not to confuse it with a real header file. The extension can be anything you want, but better make it unique in the project.

/* structs.absdef -- abstract definition of data structures */

#ifndef BEGIN_STRUCT_DEF
#error "This file should not included only from structs.h or pybind.cc"
#endif

BEGIN_STRUCT_DEF(Config)
  STRUCT_MEMBER(Config, bool, option1)
  STRUCT_MEMBER(Config, bool, option2)
  STRUCT_MEMBER(Config, int, arg1)
END_STRUCT_DEF()

 ... and many more structs ...

The #ifdef is just to stop compilation immediately; otherwise, you'll get a buttload of errors, probably not explaining what the real error was.

This file will be included into your normal C header, which defines all the structs in C syntax. This is the file you include as a normal C header into your C files, where you want these structs be normal structs.

/* structs.h -- C   concrete definitions of data structures */

#ifndef MYPROJECT_STRUCTS__H
#define MYPROJECT_STRUCTS__H

#define BEGIN_STRUCT_DEF(stype)            struct stype {
#define STRUCT_MEMBER(stype, mtype, name)    mtype name;
#define END_STRUCT_DEF()                   };

#include "structs.absdef"

#undef BEGIN_STRUCT_DEF
#undef STRUCT_MEMBER
#undef END_STRUCT_DEF

#endif  // MYPROJECT_STRUCTS__H

Now, one of the C sources is special: it transforms the macro definitions into pybind syntax. I have no idea how pybind works; I'm blindly copying your example.

/* pybind.cc -- Generate pybind11 Python bindings */

#include "pybind11.h"
#include "other.h"
#include "stuff.h"

#include "structs.h"  /* You need "normal" definitions here */

#define BEGIN_STRUCT_DEF(stype)            py::class_<stype>(m, #stype)
#define STRUCT_MEMBER(stype, mtype, name)   .def_readwrite(#name, &stype::name)
#define END_STRUCT_DEF()                   ;

void create_pybind_bindings() {
  #include "structs.absdef"  /* second time */
}

#undef BEGIN_STRUCT_DEF
#undef STRUCT_MEMBER
#undef END_STRUCT_DEF

Two points that you should pay attention too.

First, there is no space between function-like macro and the opening parenthesis:

// Correct:
#define FOO(x) ((x)   42)
// This
int j = FOO(1);
// expands into
int j = ((1)   42)

// Incorrect:
#define BAR (x) ((x)   42)
// This
int j = BAR(1);
// expands into
int j = (x) ((x)   42)(1);

i.e. BAR is substituted as is, after = and before (1);. What your compiler will tell you is not what really happened, so be careful.

Second is the use of preprocessor's stringizing operator #, which expands the following function-like macro argument into a double-quoted string: #sname turns into "Config", just what you need.

3. Bonus: a peek under the hood

Obviously, we don't have the files "pybind11.h", "other.h" and "stuff.h": they are just placeholder names, so I'll simply create empty ones. The 3 other files I have literally copied from this answer. When you compile pybind.cc, the C preprocessor is first invoked by the compiler driver. We'll invoke it alone and examine its output. The c -E <filename.cc> command tells the compiler to call the preprocessor, but instead of ingesting the resulting file, just print it to stdout and stop.

I'm condensing the output by condensing multiple empty lines: the preprocessor strips comment lines and lines with directives it took and processed, but still prints empty line to maintain correct line number for diagnostics, possibly output by next processing phases. The extra lines starting with # are for next passes and the same purpose, too: they simply establish line number and file name being processed. Ignore them for good measure.

$ touch "pybind11.h" "other.h" "stuff.h"
$ ls *.{cc,h,absdef}
other.h  pybind.cc  pybind11.h  structs.absdef  structs.h  stuff.h
$ c   -E pybind.cc
# 1 "pybind.cc"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "pybind.cc"

# 1 "pybind11.h" 1
# 4 "pybind.cc" 2
# 1 "other.h" 1
# 5 "pybind.cc" 2
# 1 "stuff.h" 1
# 6 "pybind.cc" 2

# 1 "structs.h" 1
# 10 "structs.h"
# 1 "structs.absdef" 1

struct Config {
  bool option1;
  bool option2;
  int arg1;
};
# 11 "structs.h" 2
# 8 "pybind.cc" 2

void create_pybind_bindings() {
# 1 "structs.absdef" 1

py::class_<Config>(m, "Config")
  .def_readwrite("option1", &Config::option1)
  .def_readwrite("option2", &Config::option2)
  .def_readwrite("arg1", &Config::arg1)
;
# 15 "pybind.cc" 2
}

4. Colophon, or A bit of smartassery and a bit of history

  • Not every new technology is better for everything simply because it's new.
  • The preprocessor is in fact slightly older than the C language itself. 49 years old, to be exact. C adopted the preprocessor used inside Bell Labs for other languages.
  • Related