Home > Blockchain >  Is the comparison of strings or string views terminated at a null-character?
Is the comparison of strings or string views terminated at a null-character?

Time:05-25

May a string or string_view include '\0' characters so that the following code prints 1 twice?

Or is this just implementation-defined?

#include <iostream>
#include <string_view>
#include <string>

using namespace std;

int main()
{
    string_view sv( "\0hello world", 12 );
    cout << (sv == sv) << endl;
    string str( sv );
    cout << (str == sv) << endl;
}

This isn't a duplicate to the question if strings can have embedded nulls since they obviously can. What I want to ask if the comparison of strings or string views is terminated at a 0-character.

CodePudding user response:

nul-character is part of comparison, see https://en.cppreference.com/w/cpp/string/basic_string/operator_cmp

Two strings are equal if both the size of lhs and rhs are equal and each character in lhs has equivalent character in rhs at the same position.

CodePudding user response:

Language lawyer answer since the standards documents are, by definition, the one true source of truth :-)

The standard is clear on this. In C 17 (since that's the tag you provided, but later iterations are similar), [string.operator==] states that, for using strings and/or string views, it:

Returns: lhs.compare(rhs) == 0.

The [string.compare] section further states that these all boil down to a comparison with a string view and explain that it:

Determines the effective length rlen of the strings to compare as the smaller of size() and sv.size(). The function then compares the two strings by calling traits::compare(data(), sv.data(), rlen).

These sizes are not restricted in any way by embedded nulls.

And, if you look at the traits information in table 54 of [char.traits.require], you'll see it's as clear as mud until you separate it out into sections:

X::compare(p,q,n) Returns int:

  • 0 if for each i in [0,n), X::eq(p[i],q[i]) is true; else
  • a negative value if, for some j in [0,n), X::lt(p[j],q[j]) is true and for each i in [0,j) X::eq(p[i],q[i]) is true; else
  • a positive value.

The first bullet point is easy, it gives zero if every single character is equal.

The second is a little harder but it basically gives a negative value where the first difference between characters has the first string on the lower side (all previous characters are equal and the offending character is lower in the first string).

The third is just the default "if it's neither equal nor lesser, it must be greater".

  • Related