cannot separate the different parts of an ieee 754 floating point-CodePudding

I am currently trying to separate the different parts of single precision floating point from IEEE 754 using C bitwise operators. I plan to put the separated parts in a struct. My end goal is to write arithmetic operations using bitwise operators.

I have however stumbled upon a little issue where my results don't make any sense whatsoever. I have been unable to find a solution to this problem and have been unable to find a solution on the internet. Any insight in this would be greatly appreciated.

The following is all the modules I've used.

    //test.c
    #include <stdio.h>
    #include "splicing.h"
    
    int main(void)
    {
    
        float a = 5, b = -3, c = 0.1;
        sploat A, B, C;
    
        printf("%f\n%x\n", a, *(unsigned int*) &a);
        printf("%f\n%x\n", b, *(unsigned int*) &b);
        printf("%f\n%x\n\n", c, *(unsigned int*) &c);
    
        splice(a, A);
        splice(b, B);
        splice(c, C);
    
        printf("%f\n%hhu %hhi %x\n\n", a, A.s, A.e, A.m);
        printf("%f\n%hhu %hhi %x\n\n", b, B.s, B.e, B.m);
        printf("%f\n%hhu %hhi %x\n\n", c, C.s, C.e, C.m);
    
        return 0;
    
    }
    
    
    
    /*
     * Expected results
     *
     * 5 = 0x40a00000
     *  exp =  2
     *  man = 0x200000 (explicit) 0xa00000 (spliced)
     *  sign = 0
     *
     * -3 = 0xc0400000
     *      exp =  1
     *      man = 0x400000 (explicit) 0xc00000 (spliced)
     *      sign = 1
     *
     * 0.1 = 0x3dccccd
     *  exp = -4
     *  man = 0x4ccccc (explicit) 0xcccccc (spliced)
     *  sign = 0
     */

//splicing.h
typedef struct splicedflt{
    unsigned char s;        //sign
    signed char e;      //exponent
    unsigned int m;     //mantissa
} sploat;   //short for spliced float


//unfinished
//Makes inserted sploat reflect inserted float. The problem child I need help with.
int splice(float, sploat);

//splicing.c
int splice(float num, sploat strukt)
{

    unsigned int raw = *(unsigned int*) &num;   //floats don't allow for bitmagic.

    strukt.s = raw >> 31;
    strukt.e = (raw << 1) >> 24;
    strukt.m = ((raw << 9) >> 9) | 0x1000000;

    return 0;

}

The following is output from the program. I have no idea why this is not working.

$ gcc test.c
$ ./a.out
5.000000
40a00000
-3.000000
c0400000
0.100000
3dcccccd

5.000000
0 0 0

-3.000000
160 0 5588

0.100000
160 -20 7ffe
$

CodePudding user response：

There are (as far as I can see) three issues in your code.

The first, very major issue is that you are passing your spfloat structures to the splice function by value; that is, a copy of the respective value is given to the function, and that copy is modified – the original structures (in your main function are thus left unchanged). To solve this, pass those structures 'by reference' (i.e., use pointers to the structures as the arguments).

With this fixed, your exponent fields will be wrong, because the IEEE-754 format uses biased exponents – for single-precision (32-bit) floating point data, you can correct this (in most cases) by subtracting that bias (127) from the stored value.

You also have a potential issue with violation of strict aliasing rules in your unsigned int raw = *(unsigned int*) &num; line; use the memcpy function to prevent this.

Here's a modified version of your splice function:

int splice(float num, sploat* strukt) // Pass "strukt" as a pointer
{
    unsigned int raw;
    memcpy(&raw, &num, sizeof(raw)); // Avoid strict aliasing violation
    strukt->s = raw >> 31;
    strukt->e = (signed char)((raw << 1) >> 24) - 127; // Remove the BIAS
    strukt->m = ((raw << 9) >> 9) | 0x1000000;
    return 0;

}

And here's how that would be called in main:

int main(void)
{
    float a = 5, b = -3, c = 0.1f;
    sploat A, B, C;

    //...

    splice(a, &A); // Pass the ADDRESS of each structure...
    splice(b, &B);
    splice(c, &C);

    // ...

    return 0;
}

CodePudding user response：

A call of the form splice(a, A); cannot change A because the call only passes the value of A to the function. Neither the address nor any other way of accessing A is passed to the function.

Change splice so that it takes a float argument and returns a sploat value:

sploat splice(float num)
{
    sploat S;

    unsigned raw = (union { float f; unsigned u; }) {num} .u;

    S.s = raw >> 31;
    S.e = (raw << 1) >> 24;
    S.m = ((raw << 9) >> 9) | 0x1000000;

    return S;
}

Change the calls to match:

    A = splice(a);
    B = splice(b);
    C = splice(c);

CodePudding user response：

You need to pass the reference to your struct. At the moment your function is not modifying the strukt parameter as it is passed by value and you change the local copy of it.

You also have to avoid pointer punning as it breaks strict aliasing rules. Use memcpy instead.

int splice(float num, sploat *strukt)
{

    unsigned raw;
    memcpy(&raw, &num, sizeof(raw));

    strukt -> s = raw >> 31;
    strukt -> e = (raw << 1) >> 24;
    strukt -> m = ((raw << 9) >> 9) | 0x1000000;
    return 0;
}

    splice(a, &A);
    splice(b, &B);
    splice(c, &C);

PS I did not modify your bitshift logic as it is your homework not mine.