Why is it possible to store the information content of an int pointer to an int variable in c?-CodePudding

Let us consider the following piece of code:

#include <stdio.h>
int main()
{
    int v1, v2, *p;
    p = &v1;
    v2 = &v1;
    printf("%d\t%d\n",p,v2);
    printf("%d\t%d\n",sizeof(v2),sizeof(p));
    return 0;
}

We can see, as expected, that the v2 variable (int) occupies 4 bytes and that the p variable (int pointer) occupies 8 bytes.

So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?

In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?

CodePudding user response：

There is always a warning, see below.

main.c: In function ‘main’:
main.c:6:8: warning: assignment to ‘int’ from ‘int *’ makes integer from pointer without a cast [-Wint-conversion]
     v2 = &v1;

main.c: In function ‘main’:
main.c:6:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     v2 = (int) &v1;

In the first case, just setting an integer value to a pointer value is not appropriate, because it is not a compatible type.

In the second case, with a cast of the pointer to an integer, the compiler recognizes the problem of the different sizes, which means v2 can not completely hold (int) &v1;

Conclusion: Both cases are "bad" in terms of creating an undesired behaviour.

About your question "So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?" - It can NOT completely be stored in an int variable.

About your question "In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?" - A pointer just points to an address. (It could be the address of another variable or not. It does not matter. It just points to an address.)

CodePudding user response：

We can see, as expected, that the v2 variable (int) occupies 4 bytes and that the p variable (int pointer) occupies 8 bytes.

I'm not sure what exactly the source of your expectation is there. The C language does not specify the sizes of ints or pointers. Its requirements on the range of representable values of type int afford int size as small as two 8-bit bytes, and historically, that was once a relatively common size for int. Some implementations these days have larger ints (and maybe also larger char, which is the unit of measure for sizeof!).

I suppose that your point here is that in the implementation tested, the size of int is smaller than the size of int *. Fair enough.

So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?

Who says the code stores the pointer's (entire) content in the int? It converts the pointer to an int,^* but that does not imply that the result contains enough information to recover the original pointer value.

Exactly the same applies to converting a double to an int or an int to an unsigned char (for example). Those assignments are allowed without explicit type conversion, but they are not necessarily value-preserving.

Perhaps your confusion is reflected in the word "content". Assignment does not store the representation of the right-hand side to the left-hand object. It converts the value, if necessary, to the target object's type, and stores the result.

In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?

Implementations can and have varied, and so too the meaning of "address" for different machines. But most commonly these days, pointers are represented as binary numbers designating locations in a flat address space.

But that's not really relevant. C specifies that pointers can be converted to integers and vice versa. It also provides integer types intptr_t and uintptr_t (in stdint.h) that support full-fidelity round trip void * to integer to void * conversion. Pointer representation is irrelevant to all that. It is the implementation's responsibility to implement the types and conversions involved so that they behave as required, and there is more than one way to do that.

^*C actually requires an explicit conversion -- that is, a typecast -- between pointers and integer. The language specification does not define the meaning of the cast-less assignment in the example code, but some compilers do accept that and perform the needed conversion implicitly. My remarks assume such an implementation.

CodePudding user response：

The key to understanding what's going on here is that C is an abstraction layer on top of the underlying ISA. Most architectures have little more than registers and memory addresses¹ to work with, all of which are of a fixed size. When manipulating "variables", you're really just expressing your intent which the compiler translates into more concrete instructions.

On x86_64, a common architecture, an int is in actuality either a portion of a 64-bit register, or it's a 4-byte location in memory that's aligned on a 4-byte boundary. An int* is a 64-bit value, or 8-byte location in memory with corresponding alignment constraints.

Putting an int* value into a suitably sized variable, such as uint64_t, is allowed. Putting that value back into a pointer and exercising that pointer may not be permitted, it depends on your architecture.

From the programmer's perspective a pointer is just 64 bits of data. From the CPU's perspective it may contain more than that, with modern architectures having things like internal "Pointer Authentication Codes" (PACs) that ensure pointers cannot be injected from external sources. It gets quite a bit more complicated under the hood.

In general it's best to treat pointers as opaque, that is their actual value is as good as random and irrelevant to the internal operation of your program. It's only when you're doing deeper analysis at the architectural level with sufficiently robust profiling tools that the actual internals of the pointer can be informative or relevant.

There are several well-defined operations you can do on pointers, like p[n] to access specific offsets within the bounds of a structure or allocation, but outside of that you're pretty limited in what you can do, or even infer. Remember that modern CPUs and operating systems use virtual memory, so pointer addresses are "fake" and don't represent where they are in physical memory. In fact, they're deliberately scrambled to make them harder to guess.

¹ This disregards VLIW, SIMD, and other extensions which are not so simple.