Given for example a char *p
that points to the first character in "there is so \0ma\0ny \0 \\0 in t\0his stri\0ng !\0\0\0\0"
,
how would Strrchr()
find the last occurrence of null-character?
the following questions arises:
=>What conditions would it depend on to stop the loop!
=>I think in all cases it'll try to access the next memory area to check for its condition?at some point bypassing the string boundaries, UB! so is it safe !
please if i'am wrong feel free to correct me!
CodePudding user response:
It's very simple, as explained in the comments.
The first \0
is the last and the only one in a C string.
So if you write
char *str = "there is so \0ma\0ny \0 \\0 in t\0his stri\0ng !\0\0\0\0";
char *p = strrchr(str, 's');
printf("%s\n", p);
it will print
so
because strchr
will find the 's' in "so", which is the last 's' in the string you gave it. And (to answer your specific question) if you write
p = strrchr(str, '\0');
printf("%d %s\n", (int)(p - str), p 1);
it will print
12 ma
proving that strchr
found the first \0
.
It's obvious to you that str
is a long string with some embedded \0
's in it. But, in C, there is no such thing as a "string with embedded \0
's in it". It is impossible, by definition, for a C string to contain an embedded \0
. The first \0
, by definition, ends the string.
One more point. You had mentioned that if you were to "access the next memory area", that you would "at some point bypassing the string boundaries, UB!" And you're right. In my answer, I skirted with danger when I said
p = strrchr(str, '\0');
printf("%d %s\n", (int)(p - str), p 1);
Here, p
points to what strrchr
thinks is the end of the string, so when I compute p 1
and try to print it using %s
, if we don't know better it looks like I've indeed strayed into Undefined Behavior. In this case it's safe, of course, because we know exactly what's beyond the first \0
. But if I were to write
char *str2 = "hello";
p = strrchr(str2, '\0');
printf("%s\n", p 1); /* WRONG */
then I'd definitely be well over the edge.
CodePudding user response:
There is a difference between "a string", "an array of characters" and "a char* pointer".
- A C String is a number of characters terminated by a null character.
- An array of characters is a defined number of characters.
- A char* pointer is technically a pointer to a single character, but often used to mark a point in a C style string.
You say you have a pointer to a character (char*p
) and the value of *p
is 't'
, but you believe that *p is the first character of a C style string
"there is so \0ma\0ny \0 \\0 in t\0his stri\0ng !\0\0\0\0"
.
As others have said, because you said this is a C style string and you don't know the length of it then the first null after p
will mark the end of the string.
If this was a character array char str[40]
then you could find the last null by looping from the end of the array towards the start for (i=39; i>=0; i--)
BUT you don't know then length, so that won't work.
Hope that helps, and please excuse me if I have strayed into C , its 25 years since I did C :)
CodePudding user response:
In the case you present, you can never know if the null character you've found is the last one since you have no guarantee for the end of the string. As it is a c-string, it is guaranteed that the string ends with a '\0', but if you decide to go beyond that, you can't know if the memory you're accessing is yours. Accessing memory out of an array has undefined behaviour as you can either be accessing just the next object that is in memory that is yours or you could touch memory that is unallocated, but its block still belongs to your process, or you can try to touch a segment that is not yours at all. And only the third one will cause a SIGSEGV. You can see this question to check for segmentation fault without crashing your program, but your string could have ended way before you can catch it that way.
There is a reason for the strings to have an ending character. If you insist to have \0 in multiple places in your string, you can just terminate with another character, but note that all library functions will still consider the first \0 to be the end of the string.
It is considered a bad practice and a very bad thing to have multiple \0 in your strings so if you can, avoid it.