Home > Enterprise >  Do strcmp and strstr test binary equivalence?
Do strcmp and strstr test binary equivalence?

Time:06-01

https://docs.microsoft.com/en-us/windows/win32/intl/security-considerations--international-features

This webpage makes me wonder. Apparently some windows api may consider two strings equal when they are actually different byte sequences. I want to know how C standard library behaves in this respect.

in other words, does strcmp(a,b)==0 imply strlen(a)==strlen(b)&&memcmp(a,b,strlen(a))==0? and what about other string functions, including wide character versions?

edit:

for example, CompareStringW equates L"\x00C5" and L"\x212B" printf("%d\n",CompareStringW(LOCALE_INVARIANT,0,L"\x00C5",-1,L"\x212B",-1)==CSTR_EQUAL); outputs 1

what I'm asking is whether C library functions never behave like this

CodePudding user response:

  1. two strings using different encodings can be the same even if their byte representation are different.
  2. standard library strcmp does compare plain "character" strings and in this case strcmp(a,b)==0 implies strlen(a)==strlen(b)&&memcmp(a,b,strlen(a))==0
  3. Functions like wcscmp require both strings to be encoded the same way, so their byte representation should be the same.

CodePudding user response:

The regular string functions operate byte-by-byte. The specification says:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

strcmp() and memcmp() do the same comparisons. The only difference is that strcmp() uses the null terminators in the strings as the limit, memcmp() uses a parameter for this, and strncmp() takes a limit parameter and uses whichever comes first.

The wide string function specification says:

Unless explicitly stated otherwise, the functions described in this subclause order two wide characters the same way as two integers of the underlying integer type designated by wchar_t.

wcscmp() doesn't say otherwise, so it's also comparing the wide characters numerically, not by converting their encodings to some common character representations. wcscmp() is to wmemcmp() as strcmp() is to memcmp().

On the other hand, wcscoll() compares the strings as interpreted according to the LC_COLLATE category of the current locale. So this may not be equivalent to memcmp().

For other functions you should check the documentation to see whether they reference the locale.

  • Related