Home > OS >  Why is sscanf behaving like this when converting hex strings to number?
Why is sscanf behaving like this when converting hex strings to number?

Time:11-09

I have written a piece of code that I am using to research the behavior of different libraries and functions. And doing so, I stumbled upon some strange behavior with sscanf.

I have a piece of code that reads an input into a buffer, then tries to put that value into a numeric variable.

When I call sscanf from main using the input buffer, and the format specifier %x yields a garbage value if the input string is shorter than the buffer. Let's say I enter 0xff, I get an arbitrarily large random number every time. But when I pass that buffer to a function, all calls to scanf result in 255 (0xff) like I expect, regardless of type and format specifier mismatch.

My question is, why does this happen in the function main but not in the function test?

This is the code:

#include <stdio.h>

int test(char *buf){
    unsigned short num;
    unsigned int num2;
    unsigned long long num3;
    sscanf(buf, "%x", &num);
    sscanf(buf, "%x", &num2);
    sscanf(buf, "%x", &num3);
    printf("%x", num);
    printf("%x", num2);
    printf("%x", num3);
    return 0;
}

void main(){
    char buf[16];
    unsigned long long num;
    printf("%s","Please enter the magic number:");
    fgets(buf, sizeof(buf),stdin);
    printf("%x\n", num);
    sscanf(buf, "%x", &num);
    test(&buf);

}



I expect the behavior to be cohesive; all calls should fail, or all calls should succeed, but this is not the case.

I have tried to read the documentation and do experiments with different types, format specifiers, and so on. This behavior is present across all numeric types.

I have tried compiling on different platforms; gcc and Linux behave the same, as do Windows and msvc.

I also disassembled the binary to see if the call to sscanf differs between main() and test(), but that assembly is identical. It loads the pointer to the buffer into a register and pushes that register onto the stack, and calls sscanf.

Now just to be clear: This happens consistently, and num in main is never equal to num, num2 or num3 in test, but num, num2 and num3 are always equal to each other. I would expect this to cause undefined behavior and not be consistent. Output when run - every time

./main
Please enter the magic number: 0xff
0xaf23af23423 <--- different every time
0xff  <--- never different
0xff  <--- never different
0xff  <--- never different

The current reasoning I have is in one instance sscanf is interpreting more bytes than in the other. It seems to keep evaluating the entire buffer, getting impacted by residual data in memory.

I know I can make it behave correctly by either filling the buffer, with that last byte being a new line or using the correct format specifier to match the pointer type. "%llx" for main in this case. So that is not what I am wondering; I have made that error on purpose.

I am wondering why using the wrong format specifier works in one case but not in the other consistently when the code runs.

CodePudding user response:

sscanf with %x should be used only with the address of an unsigned int. When an address of another object is passed, the behavior is not defined by the C standard.

With a pointer to a wider object, the additional bytes in the object may hold other values (possibly leftover from when the startup code prepared the process and called main). With a pointer to a narrower object, sscanf may write bytes outside of the object. With compiler optimization, a variety of additional behaviors are possible. These various possibilities may manifest as large numbers, corruption in data, program crashes, or other behaviors.

Additionally, printing with incorrect conversion specifiers is not defined by the C standard, and can cause errors in printf attempting to process the arguments passed to it.

Use %hx to scan into an unsigned short. Use %lx to scan into an unsigned long. Use %llx to scan into an unsigned long long. Also use those conversion specifiers when printing their corresponding types.

My question is, why does this happen in the function main but not in the function test?

One possibility is the startup code used a little stack space while setting up the process, and this left some non-zero data in the bytes that were later used for num in main. The bytes lower on the stack held zero values, and these bytes were later used for num3 in test.

CodePudding user response:

The argument expression in this call

test(&buf);

has the type char ( * )[16] but the function expects an argument of the type char *

int test(char *buf){

There is no implicit conversion between these pointer types.

You need to call the function like

test( buf );

Also it seems there is a typo

printf("%s","Please enter the magic number:");
printf("%x\n", num);

The variable num is not initialized.

In this call

unsigned long long num;
//...
sscanf(buf, "%x", &num);

you are using the third argument of the type unsigned long long int * but the conversion specification "%x" expects an argument of the type unsigned int *. So the call has undefined behavior.

You need to write

sscanf(buf, "%llx", &num);

The same problem exists for the used variable num that has the type unsigned short

unsigned short num;
//...
sscanf(buf, "%x", &num);

You have to write

sscanf(buf, "%hx", &num);

The same length modifiers you need to use in calls of printf

printf("%hx", num);
printf("%x", num2);
printf("%llx", num3);

Here is a demonstration program.

#include <stdio.h>

int main( void )
{
    char buf[] = "0xff\n";
    unsigned short num;
    unsigned int num2;
    unsigned long long num3;

    sscanf( buf, "%hx", &num );
    sscanf( buf, "%x", &num2 );
    sscanf( buf, "%llx", &num3 );

    printf( "%hx\n", num );
    printf( "%x\n", num2 );
    printf( "%llx\n", num3 );
}

The program output is

ff
ff
ff
  • Related