Home > other >  Why it's wrong to define a string like that in C (with regards to memory)?
Why it's wrong to define a string like that in C (with regards to memory)?

Time:06-27

The address of local variables is from high to low. Since one way to define a string in C is to define a char pointer, I defined a char pointer and initialized it to &a-4, so it should point to 4 bytes below the int a in memory. Then I used scanf() with "%4s" to restrain it from reading only 4 charactors at most. If I scan a string, the 4 charactors should be saved into the address &a-4, &a-3, &a-2, &a-1 respectively in memory.

int main()
{
    int a = 5;
    char *string =  &a - 4;
    
    scanf("%4s", string);
    printf("%s", string);
    
    return 0;
}

The result of running

But it's wrong just like that. What's the reason of it?

CodePudding user response:

Problem #1

Say an int takes 4 bytes. And say a is at address 0x1000, then the bytes of the integer are at 0x1000, 0x1001, 0x1002 and 0x1003.

I don't know why you'd think they are at 0x0FFC, 0x0FFD, 0x0FFE and 0x0FFF, but that's completely wrong.

Problem #2

Next is the issue that you misunderstand how pointer subtraction and addition works.

Subtracting an integer amount from a pointer produces a pointer that points that many values earlier. So if &a - 4 was legal, it would produce a pointer 4 int earlier, which is 16 bytes earlier given our 4 byte int assumption. (And it's not actually legal, since it would produce a pointer outside of the object to which the pointer originally pointed.)

So what you would need to do is cast it to a char *.

Pointer to 1st byte: ( (char *)&a ) 0
Pointer to 2nd byte: ( (char *)&a ) 1
Pointer to 3rd byte: ( (char *)&a ) 2
Pointer to 4th byte: ( (char *)&a ) 3

So, in your case, you'd simply want

char *string = (char *)&a;

Problem #3

This makes absolutely no sense:

printf("%s", string);

Despite the name, string doesn't point to a string. Not only is it not NUL-terminated, it's quite likely the representation of the integer includes NUL characters.

CodePudding user response:

The address of local variables is from high to low.

The C standard guarantees no such thing. It may or may not be true for your compiler platform.

char *string =  &a - 4;

Since &a has type int*, subtracting 4 from it means you shift sizeof(int) * 4 bytes over, rather than just sizeof(int).

CodePudding user response:

The address of local variables is from high to low.

Are you sure about that?

Since one way to define a string in C is to define a char pointer

Are you sure about that?

I defined a char pointer and initialized it to &a-4, so it should point to 4 bytes below the int a in memory.

Are you sure about that?


So, taking each of these in turn...

C makes absolutely no guarantees about how variables are laid out in memory beyond the following:

  • Array elements are laid out contiguously and successive elements have increasing addresses;

  • struct members are laid out in the order they are declared and have increasing addresses; however, there may be unused padding bytes between members;

That’s it. Beyond that, variable layout, byte order, etc. are all functions of the implementation. What works on your system may not work on mine.

In C, a string is a zero-terminated sequence of character values. Strings are stored in arrays of character type. A char * object is not a string. A char * may point to the first character of a string, or it may point to the first of a sequence of characters that is not a string (not zero-terminated, or zero is counted as an in-band value), or it may point to a single char object that is not part of the larger sequence.

The value of the expression &a is the address of the first byte of a and its type is int *. Pointer arithmetic is not done in terms of bytes, it’s done in terms of the size of the pointed-to type. Adding 1 to a pointer expression yields a pointer to the next object of the pointed-to type, not the next byte. Subtracting 4 from &a yields a pointer to the 4th int object prior to a (4 * sizeof (int) bytes).

You have not allocated an array. The bytes at that address don’t belong to you; they may belong to part of the stack frame and writing to them may clobber something important such as the previous frame pointer, or the address of the next instruction to execute after the function returns, which is how a lot of malware works.

Your code invokes undefined behavior in multiple ways; it is erroneous, and you should not expect any specific result.

  • Related