Looking for a concept related to "either or" in a data structure (almost like std::variant-CodePudding

I'm modernizing the code that reads and writes to a custom binary file format now. I'm allowed to use C 17 and have already modernized large parts of the code base.

There are mainly two problems at hand.

binary selectors (my own name)
cased selectors (my own name as well)

For #1 it is as follows: Given that 1 bit is set in a binary string. You either (read/write) two completely different structs. For example, if bit 17 is set to true, it means bits 18 should be streamed with Struct1. But if bit 17 is false, bits 18 should be streamed with Struct2. Struct1 and Struct2 are completely different with minimal overlap.

For #2 it is basically the same but as follows: Given that x bits in the bit stream are set. You have a potential pool of completely different structs. The number of structs is allowed to be between [0, 2**x] (inclusive range). For instance, in one case you might have 3 bits and 5 structs. But in another case, you might have 3 bits and 8 structs. Again the overlap between the structs is minimal.

I'm currently using std::variant for this.

For case #1, it would be just two structs std::variant<Struct1, Struct2>

For case #2, it would be just a flat list of the structs again using std::variant.

The selector I use is naturally the index in the variant, but it needs to be remapped for a different bit pattern that actually needs to be written/read to/from the format.

Have any of you used or encountered some better strategies for these cases? Is there a generally known approach to solve this in a much better way?

CodePudding user response：

If you want your data to be bit-continuous you can't use std::variant You will need to use std::bitset or managing the memory completely manually to do that. But it isn't practical to do so because your structs will not be byte-aligned so you will need to do every read/write manually bit by bit. This will reduce speed greatly, so I only recommend this way if you want to save every bit of memory even at the cost of speed. And at this storage it will be hard to find the nth element you will need to iterate.

std::variant<T1,T2> will waste a bit of space because 1) it will always use enough space for storing the the bigger data, but using that over bit-manipulation will increase the speed and the readability of the code. (And will be easier to write)

CodePudding user response：

Is there a generally known approach to solve this in a much better way?

Nope, it's highly specific.

Have any of you used or encountered some better strategies for these cases?

The bit patterns should be encoded in the type, somehow. Almost all the (de)serialization can be generic so long as the required information is stored somewhere.

For example,

template <uint8_t BitPattern, typename T>
struct IdentifiedVariant;

// ...

using Field1 = std::variant< IdentifiedVariant<0x01, Field1a>,
                             IdentifiedVariant<0x02, Field1b> >;

I've absolutely used types like this in the past to automate all the boilerplate, but the details are extremely specific to the format and rarely reusable.

Note that even though you can't overlay your variant type on a buffer, there's no need for (de)serialization to proceed bit-by-bit. There's hardly any speed penalty so long as the data is already read into a buffer - if you really need to go full zero-copy, you can just have your FieldNx types keep a pointer into the buffer and decode fields on demand.