Home > Software engineering >  C : Is there any bijective mapping between types and any other data type defined by the standard?
C : Is there any bijective mapping between types and any other data type defined by the standard?

Time:11-05

I am working on a project that makes heavy use of static polymorphism. A particular use-case that I am interested in would be made possible by static reflection, but we still don't have this in C . The use case looks something like this: I have a functions that read/write a data structure to/from a binary file:

template <typename data_t> void write_binary(const my_type_t<data_t>& obj)
{
    //write a binary file...
}
template <typename data_t> void read_binary(my_type_t<data_t>& obj)
{
    //read a binary file...
}

I would like to enforce that I can only read data from files that were output by the same type, e.g. my_type_t<std::string> can only read from binary files output by my_type_t<std::string>, etc. The way I want to do this is to add a small header to the binary file that identifies the specialization of data_t:

template <typename data_t> void write_binary(const my_type_t<data_t>& obj)
{
    //write header type_name(data_t)
    //write a binary file...
}
template <typename data_t> void read_binary(my_type_t<data_t>& obj)
{
    //read header
    //assert header == type_name(data_t)
    //read a binary file...
}

I am aware of the existence of typeid(data_t).name() and the various methods of demangling it, but I want something that is defined by the standard.

So my precise question is this: for any two types type1_t and type2_t, is there any C standard-defined mapping "F" whatsoever such that F(type1_t) == F(type2_t) always implies type1_t == type2_t, and type1_t == type2_t always implies F(type1_t) == F(type2_t), independent of compiler? That is, is there any bijective mapping between types and some kind of serializable value defined by the c standard?

EDIT There is a subtlety in this question that I did not initially emphasize: I do not want to serialize arbitrary types. The function body that writes my objects to files is already implemented. What I want (as stated in the question above) is simply a unique identifier for every type that is compiler independent. The role of the template specialization data_t of my_type_t<data_t> is not to affect what information is written/read, but rather how it is interpreted.

Just a couple of other thematic points:

  • Due to the nature of my project, I cannot know ahead of type what type data_t will be, I must allow it to feasibly be anything.
  • It is very much undesirable for me to have to place requirements on what types can be used for the template specification, i.e. requiring people to implement some kind of "name" field for their types. This is because the final type data_t that ends up being used for the I/O is not tied to the interfaces that my users are exposed to.
  • While the details of how instances of types are stored in memory are indeed platform- and compiler-dependent, the names of the types themselves are ultimately properties only of the source code, not the compiler.

CodePudding user response:

The meaning of a type only makes sense within modules, that use the same compiler. So you could just go with typeid(...).name() or typeid(...).hash_code() which are standard functions.

If you have different compilers (and/or different platforms) involved, there is no possibility to reconstruct an arbitrary type from the other compiler. The memory layout could be completely different. Even an int could be different in size or have a different byte ordering. How would the information that it is an int help you anyway?

You should write your library in a way that the users have customization points to serialize and deserialize their types. You simply cannot write generic read_binary and write_binary functions in a portable way.

CodePudding user response:

No.

Nor does the problem seem to benefit from one. Serialization is not generically possible in C , so you will have customization points whether you implement them or your user does to serialize and deserialize and they will be type-specific. In other words, in:

template <typename data_t> void write_binary(const my_type_t<data_t>& obj)
{
    //write header type_name(data_t)
    //write a binary file...
}

The write a binary file has to be specific to data_t. There have to be cases to write a std::string differently than an int. Each of those cases can prepend an identifying header if they want. The deserialization can check that header. The deserialization can also check other invariants of the type.

requiring people to implement some kind of "name" field for their types

A customization point doesn't require a particular field. There are ways to allow customization of behavior non-intrusively such as template specialization (traits) and ADL (overloading).

the names of the types themselves are ultimately properties only of the source code

The types are a property of the source code. The names, and the spelling, are a choice of a particular formatting of the types. type_id(x).name() is one choice of formatting, which will differ on different compilers. A demangled name is another, which will differ on different platforms. Demangled names are not necessarily unique.

(Finally, using type names to identify the serialized value is cute but likely to yield surprises. For example, one would generally expect to be able to rename a class type without affecting serialized data. One would generally expect to move it to a new namespace, even with a typedef in the old location for minimal impact, without affecting serialized data.)

  • Related