Do memory addresses expressed in hexadecimal always need to start with '0x'?-CodePudding

Do memory addresses expressed in hexadecimal always need to start with '0x'? Or can it be any other? What are the conditions?

#include <stdio.h>

int main(void)
{
    int n = 50;
    int *p = &n;
    
    printf("%p\n", p);
}

Here the output I got is '000000000062FE14'. Shouldn't it start with 0x?

CodePudding user response：

The format of %p's outlook is implementation-defined. On gcc and clang (at least the versions used by tio.run), it appears to get a 0x prefix (and use lowercase for hex digits a-f), on your compiler it does not (and uses A-F instead). Both behaviors are legal.

If you want your code to behave in a consistent way, you'll need to use %x or %X as the base format code, so you can precisely specify the inclusion of 0x exactly once. To preserve the width behavior you've already got (always a fixed number of zero-padded hex digits sufficient to represent any pointer value for that architecture), you'll need to explicitly specify the width as well. The final version (that ensures you get 0x000000000062FE14 on any 64 bit pointer architecture) is:

#include <stdio.h>
#include <inttypes.h>

int main(void)
{
    int n = 50;
    int *p = &n;
    
    printf("0x%0*"PRIXPTR"\n", 2*(int)sizeof(p), (uintptr_t)p);
}

Breaking that down:

#include <inttypes.h> provides (through stdint.h) the typedef for uintptr_t, and the macros for printing it portably
0x is prefixed on manually (because the # modifier won't add 0x for an input of zero, and we want it there even for NULL pointers)
0* says "pad with zeroes out to a width of the first argument"
PRIXPTR is a macro that produces the appropriate format code for uppercase hex relative to a uintptr_t (use PRIxPTR for lowercase hex)
2*(int)sizeof(p) is passed to match the use of * for the width, which allows us to compute the size needed for exactly as many digits as the architecture requires to print any pointer of that type in the same fixed width. The cast to int is needed because * explicitly expects int for that argument, and sizeof produce size_ts; I'm fairly sure I can rely on sizeof for a pointer returning a value that fits in int though, so the cast is safe. :-)
(uintptr_t)p casts to an integer type sufficient to hold any pointer to void (which means it can hold any pointer to object type, but outside of POSIX, there's no guarantee it can hold a function pointer); the x/X codes work with integers, not pointers, so it can't be passed as a pointer without violating the spec.

Technically, support for [u]intptr_t is optional (and requires at least C99/C 11, but hopefully that's not an issue). But I strongly suspect the systems that don't provide [u]intptr_t have pointers larger than any provided integer type (they'd be weirdo systems where uintmax_t may be smaller than the number of bits in a pointer, e.g. a system where programs are natively aware of memory across the cluster and can directly address it with 128 bit global address pointers that can refer to non-local memory, but the processor is still 64 bits, and the compiler doesn't bother to support combining two 64 bit registers to represent a single 128 bit integer, so uintmax_t is too small to fit a pointer address), so you'd have no way of handling them portably anyway (you'd be stuck with %p).

CodePudding user response：

Tthe %x format has an option %#x which means that "0x" gets appended to the output. This isn't specified for %p however, but there are ways to safely convert the pointer to a large integer and then print:

#include <stdio.h>
#include <inttypes.h>

int main(void)
{
    int n = 50;
    int *p = &n;
    printf("%"PRIxPTR "\n", (uintptr_t)p);
    printf("%#"PRIxPTR "\n", (uintptr_t)p);
}

Outputs something along the lines of:

7ffce1c44c04
0x7ffce1c44c04

CodePudding user response：

Since the format output by %p is "implementation-defined" by the C standard (§7.21.6.1 The fprintf function) and POSIX (fprintf()), different implementations do it differently. Some include a 0x prefix; some don't (and some might use 0X, but I don't remember seeing that in use). Many use lower-case letters for the digits 10-15; it seems your implementation uses upper-case, which is unusual. Some implementations pad with leading zeros; many do not. On macOS, a null pointer prints as 0x0 while other pointers print values like 0x7ffeebcf53bc, so the width isn't necessarily fixed.

There is no requirement that there is uniformity across implementations. If you want uniformity, use the type uintptr_t and macros such as PRIXPTR (or PRIxPTR) from <inttypes.h>.

#include <assert.h>
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

static_assert(sizeof(void *) == sizeof(void (*)(void)),
              "Object pointers are not the same size as function pointers");

#ifndef PTR_WIDTH
#define PTR_WIDTH "12"
#endif

#define PTR_FORMAT "0x%." PTR_WIDTH PRIXPTR

int main(void)
{
    printf("Object pointers:\n");
    int i = 0;
    int *a = malloc(3 * sizeof(*a));
    int *p = (int *)4100;

    printf("%p\n", (void *)0);
    printf("%p\n", &i);
    printf("%p\n", a);
    printf("%p\n", p);

    printf(PTR_FORMAT "\n", (uintptr_t)0);
    printf(PTR_FORMAT "\n", (uintptr_t)&i);
    printf(PTR_FORMAT "\n", (uintptr_t)a);
    printf(PTR_FORMAT "\n", (uintptr_t)p);

    printf("Function pointers:\n");
    printf("%p\n", (void *)(uintptr_t)main);
    printf("%p\n", (void *)(uintptr_t)printf);
    printf(PTR_FORMAT "\n", (uintptr_t)main);
    printf(PTR_FORMAT "\n", (uintptr_t)printf);

    free(a);
    return 0;
}

On a Mac, this produces:

Object pointers:
0x0
0x7ffee094539c
0x7ffed9405a10
0x1004
0x000000000000
0x7FFEE094539C
0x7FFED9405A10
0x000000001004
Function pointers:
0x10f2bddd0
0x7fff205f30a8
0x00010F2BDDD0
0x7FFF205F30A8

You will probably see different values, but the output should be similar. On a Mac, I've never seen an address with more than 12 hexadecimal digits, so I set PTR_WIDTH to 12 (as a string). You can set it to 16 (e.g. gcc -o pp29 -DPTR_WIDTH='"16"' pp29.c) if you want the maximum width for a 64-bit system, or use 8 if you're on a 32-bit system.

Note that you cannot officially convert function pointers to object pointers or vice versa directly in C:

§6.2.5 Types ¶28:

A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.⁴⁸⁾ Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.

⁴⁸⁾ The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

§6.3.2.3 Pointers ¶6-8:

6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

7 A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned⁶⁸⁾ for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

8 A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

⁶⁸⁾ In general, the concept ''correctly aligned'' is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

However §6.2.3.2 ¶6 provides an escape hatch — convert to an appropriate integer type — but be aware that in theory (though rarely in practice) there could be platforms where there isn't an integer type that can hold function pointers. That is why there are two consecutive casts when printing the function addresses directly.

1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
    intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
    uintptr_t
These types are optional.

It would be a very unusual machine where the uintptr_t or intptr_t types are not available.

CodePudding user response：

Do memory addresses of hexadecimals always need to end in '0x' form?

Answer: No, they don't.

From C11:

p

The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner.

It's implementation-defined. And the pointer must be cast to void *, else your code invokes undefined behaviour.

printf ("%p\n", (void *) p);

CodePudding user response：

%p is a format specifier for "pointers" and the interpretation, unlike some others, is "implementation dependent" so one cant really have an expectation of consistency, which is fair, given different environments can have different addressing semantics