Home > database >  Multiply float by a number using bitwise operators
Multiply float by a number using bitwise operators

Time:07-16

I have this function that takes in the bits of a float (f) as a uint32_t. It should use bit operations and to calculate f * 2048 and should return the bits of this value as a uint32_t.

If the result is too large to be represented as a float, inf or -inf should be returned returned; and if f is 0, -0, inf or -inf, or Nan, it should be returned unchanged.

uint32_t float_2048(uint32_t f) {
    uint32_t a = (f << 1) ;

    int result = a << 10;

    return result;
}

This is what I have so far but if I give it the value '1' it returns 0 instead of 2048. How do I fix this?

Some example inputs and outputs:

./float_2048 1
2048
./float_2048 3.14159265
6433.98193
./float_2048 -2.718281828e-20
-5.56704133e-17
./float_2048 1e38
inf

CodePudding user response:

As mentioned in the comments, to multiply a floating-point number by a power of 2 (assuming, as is likely, that it is represented in IEEE-754 format), we can just add that power to the (binary) exponent part of the representation.

For a single-precision (32-bit) float value, that exponent is stored in bits 30-23 and the following code shows how to extract those, add the required value (11, because 2048 = 211), then replace the exponent bits with that modified value.

uint32_t fmul2048(uint32_t f)
{
    #define EXPONENT 0x7F800000u
    #define SIGN_BIT 0x80000000u
    uint32_t expon = (f & EXPONENT) >> 23; // Get exponent value
    f &= ~EXPONENT;  // Remove old exponent
    expon  = 11;     // Adding 11 to exponent multiplies by 2^11 (= 2048);
    if (expon > 254) return EXPONENT | (f & SIGN_BIT); // Too big: return  /- Inf
    f |= (expon << 23); // Insert modified exponent
    return f;
}

There will, no-doubt, be some "bit trickery" that can be applied to make the code smaller and/or more efficient; but I have avoided doing so in order to keep the code clear. I have also included one error check (for a too large exponent) and the code returns the standard representation for /- Infinity (all exponent bits set to 1, and keeping the original sign) if that test fails. (I leave other error-checking as an "exercise for the reader".)

  • Related