Home > Blockchain >  Why does scanf("%d") work with an 8-bit datatype casted to an int*, but printf does not?
Why does scanf("%d") work with an 8-bit datatype casted to an int*, but printf does not?

Time:09-17

I'm trying to figure out if scanf can understand the datatype for %d purely based on the format specifier by hiding the datatype behind a void* and casting it to what %d should mean (ie, int*):

#include <stdio.h>
#include <stdint.h>

int main()
{
    int8_t a, b;
    void *v[2] = { &a, &b };
    
    sscanf("-111,9\n", "%d,%d", (int*)v[0], (int*)v[1]);
    printf("Works  : %d, %d\n", a, b);
    printf("Doesn't: %d, %d\n\n", *(int*)v[0], *(int*)v[1]);
    return 0;
}

Here is the output:

Works  : -111, 9
Doesn't: 0, 9

Questions:

  1. Why is it possible scanf() read into an 8-bit type when casted to an int* as verified by the direct printf of a and b? Shouldn't it overrun?
  2. Conversely, why is printf() unable to print the 8-bit types that are dereferenced as *(int*)v[0]'s when scanf can read into them?
  3. Is there some compiler-time magic that tells scanf/printf what the datatype is since the format specifier is undoubtedly insufficient?

I know this code is probably errant, but I'm still curious about the details underlying the example.

Thanks for your help!

CodePudding user response:

TL;DR: Doing something that is undefined behavior is not guarenteed to cause a crash or diagnostic. It might appear to work.

Why is it possible scanf() read into an 8-bit type when casted to an int* as verified by the direct printf of a and b? Shouldn't it overrun?

It does overrun -- you're writing (probably) 4-byte values into 1-byte spaces, so you are clobbering 3 bytes following. The thing is, out-of-range writes like this are undefined behavior, which might crash, or might not and might appear to work. Most likely they just corrupt something somewhere that will cause some later code to mysteriously crash, which is likely what is happening here (the printf call crashes or misbehaves because the data in the v array was corrupted by the scanf call).

If I comment out the second printf line and compile this with gcc on linux, it gives me:

$ ./test
Works  : -111, 9
*** stack smashing detected ***: ./test terminated
Aborted (core dumped)

which is perfectly consistent -- the scanf with incorrect pointers causes undefined behavior that doesn't manifest until later code tries to do something (in this case, when main returns and tries to clean up its stack frame).

CodePudding user response:

The comments above answer the question, here is the summary:

  1. scanf doesn't know what the types of its arguments are. - @Dia
  2. sscanf("-111,9\n", "%d,%d", &a, &b); does give a compiler warning if you enable compiler warnings. @RaymondChen
  3. It is the programmer's responsibility to make sure the format specifier matches the parameters. If not, undefined behavior ensues. - @GarrGodfrey
  4. "But how can scanf write a 32-bit int ("%d") to 8-bit types without smashing the stack?" To answer that question, you need to look at the assembly code. Determine how the stack is laid out, and see what's getting smashed. Because something is getting smashed, but it must not be anything important. Compile with -S to see the assembly, or use a debugger to examine the assembly. – @user3386109

CodePudding user response:

Why is it possible for scanf() read into an 8-bit type when casted to an int* as verified by the direct printf of a and b? Shouldn't it overrun?

But it did overrun, and the fact that a and b ended up containing sane values is not any kind of "verification"!

I modified your program like this:

int8_t x1 = 11;
int8_t x2 = 22;
int8_t x3 = 33;
int8_t a;
int8_t x4 = 44;
int8_t x5 = 55;
int8_t x6 = 66;
int8_t b;
int8_t x7 = 77;
int8_t x8 = 88;
int8_t x9 = 99;
void *v[2] = { &a, &b };

printf("before: %d %d %d %d %d %d %d %d %d\n", x1, x2, x3, x4, x5, x6, x7, x8, x9);
sscanf("-111,9\n", "%d,%d", (int*)v[0], (int*)v[1]);
printf("Works  : %d, %d\n", a, b);
printf(" after: %d %d %d %d %d %d %d %d %d\n", x1, x2, x3, x4, x5, x6, x7, x8, x9);

When I ran it, I got this output:

before: 11 22 33 44 55 66 77 88 99
Works  : -111, 9
 after: -1 -1 -1 0 0 0 77 88 99

Not surprisingly, most of the x's got smashed. (Now, there's no guarantee the compiler is going to lay these variables out consistently, so this isn't the only possible result, but it pretty clearly demonstrates that there's some smashing going on, as expected.)

CodePudding user response:

I'm trying to figure out if scanf can understand the datatype for %d purely based on the format specifier by hiding the datatype behind a void* and casting it to what %d should mean (ie, int*)

How about reading the specifications? Or at least the manual page? Although there is something to be said for experimentation, the best you can hope for from that alone is to learn about how your particular implementation does particular things. Relying on good documentation as the basis for your experiments puts you on much firmer footing.

The docs would support the conclusion that scanf relies exclusively on the field directives appearing in its format string to judge the types of the second and subsequent arguments. If you pass arguments of incorrect type then undefined behavior results. scanf expects the argument corresponding to a %d directive to be an int *, and as far as the language specifications go, a void * will not do, much less a pointer to some complete type other than int.

Now, it is fairly common for all object pointer types provided by a given C implementation to have the same size and representation, such that converting among them via cast does not affect the representation of the value. In such a case, your cast itself, though incorrect, probably does not cause a practical problem for scanf. (But it could, for reasons that might be obscure or mysterious. That's the nature of UB.)

However, the fact is that the pointed-to object is not an int, nor an object of type compatible with int, so if scanf attempts to access that object via the pointer then undefined behavior occurs for that reason as well. This is a violation of the "strict aliasing rule". Unlike the cast, this one is very likely to cause observable misbehavior in practice.

  1. Why is it possible scanf() read into an 8-bit type when casted to an int* as verified by the direct printf of a and b? Shouldn't it overrun?

Who says it doesn't overrun? Your program has undefined behavior. That can manifest as doing what you expect, or of appearing to do so. In this particular case, I would be inclined to guess that the appearance of doing the right thing is in part dependent on the order in which you read the variables and on the fact that (I infer) you are using a little-endian computer, such as an Intel-based one.

  1. Conversely, why is printf() unable to print the 8-bit types that are dereferenced as *(int*)v[0]'s when scanf can read into them?

You are again invoking undefined behavior by accessing a and b as if they were ints, when in fact they are not. Undefined behavior does not have to be consistent. Or it may be consistent with structures and behaviors that do not manifest at the C-language level. This is not something for which the language owes you an explanation: that's what "undefined" means.

  1. Is there some compiler-time magic that tells scanf/printf what the datatype is since the format specifier is undoubtedly insufficient?

The format string is all that the language specification requires there to be, and it is absolutely sufficient to tell scanf and printf what to expect. It is not sufficient to enable them to validate that the argument types are in fact what they were told to expect, but if they had such a capability then they would not need the format string to tell them about the types in the first place. It is the programmer's responsibility to provide arguments that match the format string, and this is not onerous because the programmer provides the format string themselves, too. In the scanf case, it is also the programmer's responsibility to provide valid pointer values that scanf can use without violating the strict aliasing rule. The language specifies what happens when you do that correctly; it does not promise any particular behavior when you do it incorrectly.

  • Related