Home > Back-end >  Is `scanf("%d", ...)` as bad as `gets`?
Is `scanf("%d", ...)` as bad as `gets`?

Time:12-23

For many years, gets has been universally disparaged as being an unsafe function. (The canonical SO question is Why is the gets function so dangerous that it should not be used?). The gets function is so bad that it has been removed from the C11 language standard. Supporters of gets (there are few if any) would argue that it is perfectly fine to use it if you know about the structure of the input.

Why do people who disparage gets and acknowledge the folly of relying on the structure of the input allow the usage of %d as a scanf conversion specifier? That's a sociological question, and the real question is: why is %d in a scanf format string unsafe?

CodePudding user response:

If the format string to scanf contains a raw %d conversion specifier ("raw" meaning "without a maximum field width"), the behavior is undefined if the input stream contains a string that is a valid representation of an integer that cannot fit in an int. eg, the string 5294967296 cannot be represented in an int on a platform where sizeof(int) == 4. The C language only guarantees that an int be large enough to hold the range -32767 thru 32767, so any input stream that contains the string 32768 could potentially lead to undefined behavior. This potential undefined behavior can be avoided by using M. Most modern platforms have a value of INT_MAX that is much larger than 32767, so realistically the width-modifier on the conversion specifier can be larger than 4, but it ought to be determined (either at compile time or at run time) for the platform, and it must be present in the format string.

If you don't add a width modifier, you might as well just use gets to read a line into a buffer and use sscanf to parse the values. This will (perhaps) make the error more obvious to the reader.

CodePudding user response:

No, scanf("%d", …) is not as bad as gets.

gets is as bad as it gets because it is not possible to use it safely, in virtually any environment. Buffer overflow is likely, cannot be prevented, and is quite likely to lead to arbitrarily bad consequences.

The worst thing that can happen with scanf("%d", …), on the other hand, is integer overflow. While this is theoretically also undefined behavior, in practice it virtually always results in either (a) quiet wraparound, (b) overflow to INT_MAX or INT_MIN, or (c) a runtime exception which may terminate the calling program.

It is extremely difficult to imagine a scenario under which an attacker could exploit a program using scanf("%d", …). Exploits involving gets, on the other hand, are commonplace.

(Although not the question asked, it's true that scanf("%s", …) is precisely as dangerous as gets. It's a fair question why the former isn't always disparaged as strenuously as the latter.)

CodePudding user response:

As well known, former gets() offer no control/detection of buffer overflow leading to UB. It could have had it had a size parameter.

In addition to @William Pursel good answer concerning int range.

scanf("%d", ...): Input not limited to one line.

gets() read 1 line. "%d" in scanf(), first consumes leading white-space which may include several lines.

scanf("%d", ...): does not read the whole line.

Unlike gets(), scanf("%d", ...) leaves any input after the input for the int. This often includes a '\n'. Not reading the entire lines often sets the seed for subsequent problems.

Depending on goals, scanf("%d", ...) does not complain about trailing non-numeric text.


C lacks a robust ways to read a line. IMO, fgets(), gets_s(), scanf(anything), extension getline() all lack some functionality.

I'd campaign for a int scan_line(size_t sz, char *buf /*, size_t *length_read*/) that always reads a line, always forms a string in buf and returns EOF (end-of-file, input error), 1 on success and 0 when sz is too small.


Alternatively (and more debatable) *scanf() could be improved:

  • Add ability to pass in size for "%s" and friends. This is sorely needed.

  • Defined behavior on int overflow.

  • Something like "%#\n" to scan in white-space, but not '\n'. Does not contribute to the return value.

  • Something like "%\n" to scan in 1 '\n'. Contributes to the return value. May use a leading space "% \n" to allow optional leading non-'\n' white-space.

  • Offer *scanfln() which always read just 1 line.

CodePudding user response:

gets doesn't have any means to prevent buffer overflow errors.

For scanf("%d", &x); there is no way to make buffer overflow error (it type matches format string).

Now in case of

char s[5];
scanf("%s", s); 

There is a danger of buffer overflow (when user types work with more then 4 characters), but it is easy to fix this code to protect from buffer overflow:

char s[5];
scanf("%4s", s); 

Now this version can't buffer overflow.

Note that scanf is relay bug-prone, so prevent common mistake threat warning related to format string as an error.

Basically gets there was no way to protect from invalid (to long) user input. Also ther si not way to fix it without breaking binary or source compatibility.
In case of scanf more advanced format string can protect you form buffer overflow and this can be enforced by static analysis tools.

  • Related