Home > other >  How const char* strings are compared?
How const char* strings are compared?

Time:11-21

Firstly, consider this example:

#include <iostream>
using namespace std;

int main()
{
    cout << ("123" == "123");
}

What do I expect: since "123" is a const char*, I expect ADDRESSES (like one of these answers said) of these strings to be compared.

... because != and == will only compare the base addresses of those strings. Not the contents of the strings themselves.

But still the output is 1. Okay, we actually don't know how to compare addresses of two prvalue objects (or at least I don't understand how it would be done). So let's declare these strings as variables and see what will happen:

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1230";
    cout << (a == b);
}

Still the output is 1. So const char* strings does not decay? Or compiler managed to do some optimizations and allocate memory only for one string? Ok, let's try to avoid them:

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    b = "1230";
    cout << (a == b);
}

Still the result is the same. Which made me think that const char* really does not decays. But that didn't made my life simpler. How then const char*s are compared?

Why here the output is 1:

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    cout << (a > b);
}

a is less than b, in terms of lexographical comparison, but here a is bigger. How then comparison of const char*s is implemented?

CodePudding user response:

Yes, the linked answer is correct. operator== for pointers just compares the addresses, never their content.

Furthermore, the compiler is free, but not required, to de-duplicate string literals, so all occurrences of a string literal are the same object, with the same address. That is what you observe and re-assignment b = "1230"; won't stop that.

[lex.string.14] Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above. Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

What should const char* decay to? Arrays decay, pointers don't.

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    cout << (a > b);
}

returns 1 just because a happens to point to a higher address than b, there is no lexiographical comparison done. Just use std::string or std::string_view if you require that.

CodePudding user response:

The storage details of literal character strings is completely unspecified by the C standard (except for their lifetime) and is entirely up to the compiler's discretion. For example:

const char *a="ABCDEFG";
const char *b="DEFG";

It is entirely possible for a smart compiler to produce only one string out of this, and set the 2nd pointer to point to the middle of the string.

It is also possible for the same literal character strings that come from different .cpp files to produce just a single string in the final, linked executable and both strings, that were originally compiled in different .cpp entirely, to end up having the same actual pointer value.

Similarly, pointer comparison is also implementation defined for all other cases that are not explicitly specified in the C standard. Pointer comparison has a defined behavior mostly for pointers to the members of the same array or vector, and in general is completely unspecified otherwise. There are ways to implement total order for pointers, in the C standard, but that's not relevant here.

To summarize: you cannot expect any specific behavior or particular meaning to any pointer values, otherwise.

CodePudding user response:

In this comparison

"123" == "123"

the string literals having the type const char[4] are implicitly converted to pointers to their first elements and these pointers are compared.

The result depends on compiler options that specify whether identical string literals stored as one string literal or as separate string literals.

As for this program

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    cout << (a > b);
}

then you may not use the operator > with pointers that do not point to elements of the same array. Such a comparison is undefined.

The result of the comparison depends on in which order the compiler places the string literals in the string literal pool.

CodePudding user response:

I expect ADDRESSES (like one of these answers said) of these strings to be compared.

Correct, that is what happens in both C and C . When C-strings (char arrays) or string literals are compared in C and C , the compiler shall compare only their addresses.

Or compiler managed to do some optimizations and allocate memory only for one string?

Yes! Precisely. The compiler sees "1230" twice and may (in your/our case, does, which is why we see this behavior) just use the same exact string at the same exact memory location for both of them in the code below. Therefore, they have the same address. This is a nice optimization the C and C compilers may make for you.

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1230";
    cout << (a == b);
}

Going further:

The fact that that optimization is made for you means that you can happily write things like the following, even on memory-constrained embedded systems, knowing that the program space used up does not increase by the size of the string literal each time you use the string literal:

printf("some very long string\n");
printf("some very long string\n");
printf("some very long string\n");
printf("some very long string\n");

"some very long string" is only stored in memory one single time.

That being said, if you make even a single character change to that string, the compiler may choose to make it become a new string in memory, so in the case above you're better off doing this anyway:

constexpr char MY_MESSAGE[] = "some very long string\n";
// OR:
// #define MY_MESSAGE "some very long string\n"

printf(MY_MESSAGE);
printf(MY_MESSAGE);
printf(MY_MESSAGE);
printf(MY_MESSAGE);

See also:

  1. Why do (only) some compilers use the same address for identical string literals?
  • Related