I have inherited a large code base and there is a utility function to split strings on :
char. I understand about 80% of how it works, I do not understand the *token = '\0';
line.
Any pointers are highly appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_TOKEN_SIZE 200
const char *splitter(const char *str, char delimiter, char *token) {
while (*str && (delimiter != *str)) {
*token = *str;
str ;
}
if (delimiter == *str)
str ;
*token = '\0'; // what is this line doing?
//how could the token be correct in the main() after setting it to null terminator
//here?
return str;
}
int main() {
char token[MAX_TOKEN_SIZE 1];
const char *env = "/bin:/sbin:::/usr/bin";
while (*env) {
env = splitter(env, ':', token);
//if token is empty, set it to "./"
if ((token != NULL) && (token[0] == '\0')) {
strcpy(token, "./\0");
}
printf("%s\n", token) ;
}
return 0;
}
The output is correct:
/bin
/sbin
./
./
/usr/bin
CodePudding user response:
As you have stated, the line in question is setting the char
pointed to by token
to the nul
terminator (as is required by virtually all string-handling functions in C).
But, assuming there are other characters in the extracted token, those will already have been added, sequentially, to the target array by the earlier *token = *str;
line. Note that this copies a character to the pointed-to element of the array and then increments the pointer (so that it then points to the next char
in the string/array).
So, when the while
loop has finished, token
will be pointing to the element of the array that immediately follows the last character copied in that loop – which is exactly where there needs to be a nul
terminator.
CodePudding user response:
There are subtle problems in the posted code:
the test
if ((token != NULL) && (token[0] == '\0'))
is redundant:token
is an array, hencetoken!= NULL
is always true.splitter
does not receive the length of the destination array: if thestr
argument contains a token longer thanMAX_TOKEN_SIZE
bytes, it will cause undefined behavior because of a buffer overflow.if the
delimiter
passed tosplitter
is the null byte, the return value will point beyond the end of the string, potentially causing undefined behavior.the line
*token = '\0';
just sets the null terminator at the end of the token copied fromstr
, if any.
Here is a modified version:
#include <stdio.h>
#include <string.h>
#define MAX_TOKEN_SIZE 200
const char *splitter(const char *str, char delimiter, char *token, size_t size) {
size_t i = 0;
while (*str) {
char c = *str ;
if (c == delimiter)
break;
if (i 1 < size)
token[i ] = c;
}
if (i < size) {
token[i] = '\0'; /* set the null terminator */
}
return str;
}
int main() {
char token[MAX_TOKEN_SIZE 1];
const char *env = "/bin:/sbin:::/usr/bin";
while (*env) {
env = splitter(env, ':', token, MAX_TOKEN_SIZE 1);
// if token is empty, set it to "./"
if (*token == '\0') {
strcpy(token, "./");
}
printf("%s\n", token);
}
return 0;
}
CodePudding user response:
For starters I will point to a redundant code.
This if statement
if ((token != NULL) && (token[0] == '\0')) {
has a senseless expression because token
never can be equal to NULL
. token
in main is declared as a character array. So you could write
if ( token[0] == '\0') {
Also in the string literal in this statement
strcpy(token, "./\0");
the explicit terminating zero character '\0'
is redundant. You can just write
strcpy(token, "./");
As for your question.
The function splitter
extracts a sequence of characters until the character delimiter
is encountered and stores it in the array token
,
while (*str && (delimiter != *str)){
*token = *str;
str ;
}
But the result sequence does not represent a string. It shall be ended with the terminating zero character \0
and this statement
*token = '\0';
appends the terminating zero character to the end of the extracted sequence stored in the array token
.
As for this statement
if (delimiter == *str)
str ;
then if it is not the end of the string str
(that is if the current character *str
is not the terminating zero character '\0'; if it is equal to delimiter
then it is not the terminating zero character) then the pointer str
is incremented and returned from the function to allow the caller in the next call of the function continue to process the string from the next positions.
So initially you have
const char *env = "/bin:/sbin:::/usr/bin";
the function copies character /bin
appended with the zero character '\0'
that is the string "/bin" to the array token. After this call the returned pointer from the function will point to the substring
"/sbin:::/usr/bin"
because the preceding character ':'
was skipped by this statement
if (delimiter == *str)
str ;
within the function.