Home > Mobile >  why does this lead to core dump?
why does this lead to core dump?

Time:09-28

#include <ctype.h>
#include <stdio.h>

int atoi(char *s);

int main()
{
    printf("%d\n", atoi("123"));
}



int atoi(char *s)
{
    int i;

    while (isspace(*s))
        s  ;

    int sign = (*s == '-') ? -1 : 1;

    /* same mistake for passing pointer to isdigit, but will not cause CORE DUMP */ 
    // isdigit(s), s  ;// this will not lead to core dump
    // return -1;
    /* */

    /* I know s is a pointer, but I don't quite understand why code above will not and code below will */
    if (!isdigit(s))
        s  ;
    return -1;
    /* code here will cause CORE DUMP instead of an comile-time error */

    for (i = 0; *s && isdigit(s); s  )
        i = i * 10   (*s - '0');

    return i * sign;
}

I got "Segmentation fault (core dumped)" when I accidentally made mistake about missing * operator before 's' then I got this confusing error. Why "(!isdigit(s))" lead to core dump while "isdigit(s), s ;" will not.

CodePudding user response:

From isdigit [emphasis added]

The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.

From isdigit [emphasis added]

The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined.

https://godbolt.org/z/PEnc8cW6T


An undefined behaviour includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

CodePudding user response:

You are invoking undefined behavior. isdigit() is supposed to receive an int argument, but you pass in a pointer (xref: Language / Conversions / Other operands / Pointers, ¶6).

Furthermore, there is a constraint that the argument to isdigit() be representable as an unsigned char or equal to EOF. (xref: Library / Character handling <ctype.h>, ¶1).

As a guess, the isdigit() function may be performing some kind of table lookup, and the input value may cause the function to access a pointer value beyond the table.

CodePudding user response:

All answers so far has failed to point out the actual problem, which is that implicit pointer to integer conversions are not allowed during assignment in C. Details here: "Pointer from integer/integer from pointer without a cast" issues

Specifically C17 6.5.2.2/7

If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters

Where "as if by assignment" sends us to check the rules of assignment 6.5.16.1, which are quoted in the above link. So isdigit(s) is equivalent to something like this:

char* s;
...
int param_to_isdigit = s; // constraint violation of 6.5.16.1

Here the compiler must issue a diagnostic message. If you didn't spot it or in case you are using a tool chain giving warnings instead of errors, check out What compiler options are recommended for beginners learning C? so that you prevent code like this from compiling, so that you don't have to spend time troubleshooting bugs that the compiler already spotted for you.


Furthermore, the ctype.h functions require that the passed integer must be representable as unsigned char, but that's another story. C17 7.4 Character handling <ctype.h>:

In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF

CodePudding user response:

Why no segfault from isdigit(s), s ;?

First of all. Undefined behavior can manifest itself in a lot of ways, including the program working as intended. That's what undefined means.

But that line is not equivalent to your if statement. What this does is that it executes isdigit(s), throws away the result, increments s and also throw away the result of that operation.

However, isdigit does not have side effects, so it's quite probable that the compiler simply removes the call to that function, and replace this line with an unconditional s . That would explain why it does not segfault. But you would have to study the generated assembly to make sure, but it's a possibility.

You can read about the comma operator here What does the comma operator , do?

CodePudding user response:

I wasn't able to repeat this behaviour in MacOS/Darwin, but I was able to in Debian Linux.

To investigate a bit further, I wrote the following program:

#include <ctype.h>
#include <stdio.h>

int main()
{
    printf("isalnum('a'):  %d\n", isalnum('a'));
    printf("isalpha('a'):  %d\n", isalpha('a'));
    printf("iscntrl('\n'): %d\n", iscntrl('\n'));
    printf("isdigit('1'):  %d\n", isdigit('1'));
    printf("isgraph('a'):  %d\n", isgraph('a'));
    printf("islower('a'):  %d\n", islower('a'));
    printf("isprint('a'):  %d\n", isprint('a'));
    printf("ispunct('.'):  %d\n", ispunct('.'));
    printf("isspace(' '):  %d\n", isspace(' '));
    printf("isupper('A'):  %d\n", isupper('A'));
    printf("isxdigit('a'): %d\n", isxdigit('a'));
    printf("isdigit(0x7fffffff): %d\n", isdigit(0x7fffffff));
    return 0;
}

In MacOS, this just prints out 1 for every result except the last one, implying that these functions are simply returning the result of a logical comparison.

The results are a bit different in Linux:

isalnum('a'):  8
isalpha('a'):  1024
iscntrl('\n'): 2
isdigit('1'):  2048
isgraph('a'):  32768
islower('a'):  512
isprint('a'):  16384
ispunct('.'):  4
isspace(' '):  8192
isupper('A'):  256
isxdigit('a'): 4096
Segmentation fault

This suggests to me that the library used in Linux is fetching values from a lookup table and masking them with a bit pattern corresponding to the argument provided. For example, '1' (ASCII 49) is an alphanumeric character, a digit, a printable character and a hex digit, so entry 49 in this lookup table is probably equal to 8 2018 32768 16384 4096, which is 55274.

The documentation for these functions does mention that the argument must have either the value of an unsigned char (0-255) or EOF (-1), so any value outside this range is causing this table to be read out of bounds, resulting in a segmentation error.

Since I'm only calling the isdigit() function with an integer argument, this can hardly be described as undefined behaviour. I really think the library functions should be hardened against this sort of problem.

  •  Tags:  
  • c
  • Related