Home > database >  Why doesn't this code seg fault? Does gcc turn it into a string literal?
Why doesn't this code seg fault? Does gcc turn it into a string literal?

Time:12-13

#include <stdio.h>

void print(char* c) {
    printf("%s\n", c); //Uses %s to print a string
}

int main() {
    char a = 'd';
    print(&a);
    return 0;
}

How does printf know to stop printing the next character after printing 'd' when there is not a null terminating character at the end? When I ran it, it just printed 'd' and ended. Is this normal behaviour?

CodePudding user response:

One letter string requires at least two char array to accommodate the letter and null terminating character.

In your code, you invoke Undefined Behaviour. In your case, you were simply lucky that the next byte in the memory was zero.

How does printf know to stop printing the next character after printing 'd' when there is not a null terminating character at the end?

It does not know. It was your lucky day. What will happen is undefined.

#include <stdio.h>

void printchar(char c) {
    printf("%c\n", c); 
}

void printstring(char *s) {
    printf("%s\n", s); 
}


int main() {
    char a = 'd';
    printchar(a);

    char b[2] = {'b',0};
    printstring(b);
    return 0;
}

CodePudding user response:

Undefined behevior is undefined. But in practical environments, your particular example is practically bound not to segfault.

It would only segfault if printf, trying to find a terminating \0, didn't hit any and ends up in an unmapped or access-protected page.

When you give it an on-the stack char, it'll search the currently used portion of the stack (stacks grow downwards on most architectures). Since you're in main, the stack will not be very deep, containing only main's frame and what the OS and your libc put before it, but that's practically enough to provide plenty of zeros. For example, Unixes put argv on there (terminated by a NULL pointer => zero bytes), the pointer array from char **environ; (also NULL-terminated) all the string (each of them '\0'-terminated) and you could also have zeros in the calling code's frame in addition to that.

And even if you are not on a Unix (and can't rely on environ and arv nul bytes), any the likelyhood of a nul byte on the used portion of the stack is very high.

CodePudding user response:

Any code evidently wrong is wrong and invokes an undefined behavior.

Said that the specific case you exposed have much chances to correctly printout on many systems for the incidental coincidence that the char type is in most systems the smallest integer available, while the preferred alignment is based on the native processor bitness that typically is greater than a char type. This implicates the use of padding that normally is the 0 value.

Now put together a char variable followed by 0 padding and you get a null terminated string.

The code is wrong, the probability that it works is high...

CodePudding user response:

I cannot speak for your machine, but on my machine the C code you provided results in this machine code fragment (found by compiling the program and invoking objdump -D -j .text on it):

    117b:   c6 45 f7 64             movb   $0x64,-0x9(%rbp)
    117f:   48 8d 45 f7             lea    -0x9(%rbp),%rax
    1183:   48 89 c7                mov    %rax,%rdi
    1186:   e8 be ff ff ff          call   1149 <print>

(Keep in mind that passing different options to your compiler or using a different compiler altogether might result in different machine code)

The code stores a byte (movb) with hex value 0x64 on the stack. 0x64 is the ascii value for the "d" character. Afterwards, it loads the address of that byte on the stack (lea) to the rax register and copies it to rdi, which is the register used to pass the first argument to functions on Linux, which i use. This way, during the print call in the next instruction the first argument is a pointer to your character on the stack.

Using GDB one can inspect the contents of the stack memory prior to and after executing the particular movb.

Before:

00:0000│ rsp 0x7fffffffe0b0 ◂— 0x0
01:0008│     0x7fffffffe0b8 ◂— 0x4ef8437da34e1300
02:0010│ rbp 0x7fffffffe0c0 ◂— 0x1

After:

00:0000│ rsp 0x7fffffffe0b0 ◂— 0x6400000000000000
01:0008│     0x7fffffffe0b8 ◂— 0x4ef8437da34e1300
02:0010│ rbp 0x7fffffffe0c0 ◂— 0x1

(i have a GDB extension called pwndbg installed, so your output might look differently)

Essentially, it seems that the stack memory that the "d" is written to is all zero before the write. Thanks to this, the very next byte after 0x64 is a null character, which creates an "impromptu" printable string.

As others have said this is not a behavior you can rely on, instead it is merely a coincidence. You should not write code this way when writing programs that are actually supposed to work :) but this is already emphasized enough in other answers. Additionally, because of this being undefined behavior it is actually possible that on your machine the program works for entirely different reasons. To be sure of the answer i recommend you debug your code and inspect what the memory contents look like in your case. To do that you should:

  1. Compile your program
  2. Disassemble it, for example using objdump -D -j .text <your program name>
  3. Find the assembly responsible for passing "d" to your print function. Hint: it will be somewhere in the main function
  4. If needed, inspect what exactly is happening with the memory/registers using GDB
  • Related