How to iterate over every bit of a type in C-CodePudding

I wanted to write the Digital Search Tree in C using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this

#include <iostream>

template<typename T>
bool getIthBit (T data, unsigned int bit) {
    return ((*(((char*)&data) (bit>>3)))>>(bit&7))&1;
}

int main() {
    uint32_t a = 16;
    for (int i = 0; i < 32; i  ) {
        std::cout << getIthBit (a, i);
    }
    std::cout << std::endl;
}

Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here

#include <iostream>

struct s {
    uint32_t i;
    char c;
};

int main() {
    std::cout << sizeof (s) << std::endl; 
}

The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".

CodePudding user response：

It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C to know which of those bits - that are part of the type - are padding.

I recommend not attempting to support types that have padding as keys of your DST.

Following trick might work for finding padding bits of trivially copyable classes:

Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.

I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.

So, there are a lot of caveats, but it should work in your example case:

s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);

std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;

CodePudding user response：

There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C 17.

So if an object has unique representations, it is safe to assume that every bit is significant.

There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.

Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }

Internally compiler has to distinguish padding bits from value bits to implement C 20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.

¹ like multiple NaN floating point values