I have been studying C lately and i have came across a source code of strlen which really confused me and i had to look it up in other places but couldnt really understand it yet.
strlen:
#include <stdio.h>
int strlen(const char *str)
{
const char* eos = str; // 1
while (*eos ); // 2
return (eos - str - 1); // 3
}
int main()
{
int len = strlen("Hello");
printf("Len: %d" , len);
return 0;
}
I couldnt understand why are we using the local variable eos and why are we using it in an empty while loop and then returning that last line from the strlen function ?
CodePudding user response:
Why is there an empty while loop? The loop increments eos
until it points to the location directly after the null terminator. The expression *eos
is very compact way of telling your computer to get the char value the eos
is currently pointing to, and then increment eos
so it points at the next char. There is no need for a body in this while loop.
Why does it use a local variable? Since we are incrementing a pointer (i.e. eos
), and we also need a pointer to the beginning of the string in the final computation (i.e. str
), we can't simply use str
for everything. We need at least one other variable to get the job done.
How does the last line work? The expression eos - str
does pointer subtraction, so it calculates the distance between those two pointers and returns that as an integer. Then we subtract 1 to make the answer correct.
CodePudding user response:
The pointer eos
is used to advance along the string. EOS is an abbreviation for end-of-string. The while
loop is empty because is doesn't need to do anything - because advancing the pointer is all that is needed. Once eos
points at the null terminator, the loop exits. The last line of the function then subtracts the end pointer from the start pointer, to give the number of characters that eos
moved past. The last -1
is there to correct for the fact that, thanks to the post-increment operator, eos
is always advanced one character more than it should be.
A less confusing implementation would have been:
int strlen(const char *str)
{
const char* eos = str; // 1
while (*eos)eos ; // 2
return (eos - str); // 3
}
CodePudding user response:
For starters the function return type should be size_t
.
size_t strlen(const char *str);
If two pointers point to elements of the same array then the difference between the pointer that points to the element with higher index and the pointer that point to the element with lower index yields the number of elements between the two indices.
For example if you have an array like
const char s[] = "12";
and two pointers
const char *p1 = &s[0];
const char *p2 = &s[1];
then the difference
p2 - p1
yields the value 1
.
This pointer within the original function
const char* eos = str;
will be moved across the passed string until the terminating zero character will be found.
while (*eos );
This loop can be rewritten like
while ( *eos != '\0' );
The value of the postfix expression *esp
is the value of the pointed character before incrementing the pointer.
So when the terminating zero is found the pointer eos
will point to the memory after the terminating zero character and the loop stops its iteration.
Thus now the pointer str
points to the beginning of the passed string and the pointer eos
points to teh memory just after the terminating zero character '\0'
.
So the difference
(eos - str - 1)
gives the number of characters in the passed string that precede the terminating zero character '\0'
.
CodePudding user response:
Strings in are character arrays terminated by what is called the null character; it's also useful to bear in mind that because there is no explicit string type in C, you need to have a pointer to the first character if you want to operate on the array of chars that you are treating as a string.
With that said, the eos
variable is initialized to point to the beginning of the string (by equating it with str
- remember, they are both just pointers to a char, so equating them means they point to the same thing).
Now, the 'empty' while loop has a side effect. Because the increment (
) operator is used in the evaluation condition, eos
gets incremented - and because it's a pointer, this means that with each iteration of the loop, eos
points to successive characters. It's basically 'walking' along the string.
This continues until the loop condition evaluates to false
. And in C, only the null character evaluates to false (for all possible character values; more generally, zero is false and non-zero is true). So basically, eos will stop incrementing at the end of the string.
Now to the return
statement. At this point we have two variables - one called str
that still points to the start of the string, and one called eos
that points to the end of the string. Well - actually eos
has spilled over because of the incrementing that was done in the loop condition. Just to be really precise here, it points to the memory address after the null character at the end of the string.
So, with a little more pointer arithmetic, if we subtract str
from eos
and then subtract 1 for the spilling over...well, we get the difference in the addresses of the last and first characters, which is the length of the string.
It's curious that the post-increment operator was used in the loop. If it had been coded as while (* eos);
, that is using the pre-increment operator, then the incrementing would happen before evaluation of eos
which would mean it wouldn't spill beyond the end of the string and there would be no need to subtract the extra 1 in the return
statement. Oh well.