Bit shifting a floating-point-CodePudding

I am working on a project of converting any real number into fixed point.

For example, the code below:

float ff = 8.125f;
std::cout << std::bitset<32>(ff) << std::endl; //print binary fotmat

Output:

00000000000000000000000000001000

Only the integer part is printed out.

When I do:

float ff = 8.125f;
int   va = ff * (1 << 8); //8bits shift left
    
std::cout << std::bitset<32>(va) << std::endl; //print binary fotmat

The output is:

00000000000000000000100000100000

Where the fraction part is printed out too.

I don't understand why.

CodePudding user response：

Only the integer part is printed out.

Well, of course, because that's all you've given the bitset: there is no std::bitset constructor that takes a float argument, so your ff is being converted to an unsigned long long, which will lose the fractional part (as any conversion of a floating-point type to an integral type will). Turning on compiler warnings will alert you to this; for example, for your first code snippet, the MSVC compiler shows:

warning C4244: 'argument': conversion from 'float' to 'unsigned __int64', possible loss of data

In your second example, you are multiplying the float value by 256 and converting to the integer value, 2080 - which is what gets passed to the bitset.

If you want the full bitwise representation of your float variable in the bitset, you need to first copy that data into an unsigned long long variable and pass that to the constructor:

    unsigned long long ffdata = 0;
    std::memcpy(&ffdata, &ff, sizeof(ff));
    std::cout << std::bitset<32>(ffdata) << std::endl;

Output:

01000001000000100000000000000000