Home > Back-end >  How does the C strtol interpret hex strings?
How does the C strtol interpret hex strings?

Time:03-23

I expect strtol("ffffffffffffffff", NULL, 16) to return -1 and strtol("7fffffffffffffff", NULL, 16) to return LONG_MAX, since the linux man-page first sentence seems to imply strtol returns signed long. The second call does return the expected result. But the first call returns LONG_MAX! Like, the input hex strings are not even the same. As if that's not confusing enough, strtol("8000000000000000", NULL, 16), which I expect to return LONG_MIN, returns the same value as the previous two calls, LONG_MAX. For the previous two calls, I thought strtol was ignoring the most significant bit in the input string, but the third call refutes this hypothesis.

Is this some weird case of casting going on or I am mixing mathematical reality with C reality?

Here is my source code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char const *argv[]) {
    long x = strtol(argv[1], NULL, 16);
    printf("%ld\n", x);
    return 0;
}

I am compiling with gcc scratch.c on Ubuntu, and I am getting my results by passing a hex string as a command line argument. For example:

./a.out ffffffffffffffff // gives LONG_MAX
./a.out 8000000000000000 // gives LONG_MAX
./a.out 7fffffffffffffff // gives LONG_MAX

CodePudding user response:

strtol("ffffffffffffffff", NULL, 16) requests strtol to interpret “ffffffffffffffff” as a hexadecimal integer. It does not request strtol to interpret it as a specification of hexadecimal values for individual bytes and then to reinterpret those bytes as an encoding for a long.

The hexadecimal integer ffffffffffffffff16 is 18,446,744,073,709,551,615 = 264−1. In your C implementation, the largest value representable in a long is 9,223,372,036,854,775,808 = 263−1.

The documentation for strtol, in C 2018 7.22.1.4 8, says that, if the value is above or below the range of representable values, LONG_MAX or LONG_MIN, respectively, is returned:

The strtol, strtoll, strtoul, and strtoull functions return the converted value, if any. If no conversion could be performed, zero is returned. If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.

It is best not to call, and especially not to rely upon, any routine until you have read and understood its documentation. Especially when a routine is not behaving as you expect, read its documentation.

CodePudding user response:

According to the C Standard (7.22.1.4 The strtol, strtoll, strtoul, and strtoull functions)

8 The strtol, strtoll, strtoul, and strtoull functions return the converted value, if any. If no conversion could be performed, zero is returned. If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.

These string literals "ffffffffffffffff" and "8000000000000000" interpreted as positive numbers can not be represented in an object of the signed type long

CodePudding user response:

The long type on your system seems to have 64 bits, with a maximum positive value of 263-1 (LONG_MAX on your platform), represented in hex as 7fffffffffffffff. Any value larger than this will cause strtol to return LONG_MAX and set errno to ERANGE.

This is what you observe for ffffffffffffffff and 8000000000000000.

Note that the hex string can start have an optional prefix of 0x or 0X.

You can parse unsigned long values with strtoul, with a maximum value of 264-1 (ULONG_MAX on your platform), represented in hex as ffffffffffffffff.

  • Related