Home > Mobile >  string splitter in C - how is it working?
string splitter in C - how is it working?

Time:09-10

I have inherited a large code base and there is a utility function to split strings on : char. I understand about 80% of how it works, I do not understand the *token = '\0'; line.

Any pointers are highly appreciated.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_TOKEN_SIZE 200

const char *splitter(const char *str, char delimiter, char *token) {

    while (*str && (delimiter != *str)) {
        *token   = *str;
        str  ;            
    }
    if (delimiter == *str)
        str  ;

    *token = '\0';    // what is this line doing?

    //how could the token be correct in the main() after setting it to null terminator 
    //here?

    return str;
} 

int main() {
    char token[MAX_TOKEN_SIZE   1];  
    const char *env = "/bin:/sbin:::/usr/bin";
    while (*env) {
        env = splitter(env, ':', token);  

        //if token is empty, set it to "./"
        if ((token != NULL) && (token[0] == '\0')) {
            strcpy(token, "./\0");            
        }

        printf("%s\n", token)  ;
    }
    return 0;
}

The output is correct:

/bin
/sbin
./
./
/usr/bin

CodePudding user response:

As you have stated, the line in question is setting the char pointed to by token to the nul terminator (as is required by virtually all string-handling functions in C).

But, assuming there are other characters in the extracted token, those will already have been added, sequentially, to the target array by the earlier *token = *str; line. Note that this copies a character to the pointed-to element of the array and then increments the pointer (so that it then points to the next char in the string/array).

So, when the while loop has finished, token will be pointing to the element of the array that immediately follows the last character copied in that loop – which is exactly where there needs to be a nul terminator.

CodePudding user response:

There are subtle problems in the posted code:

  • the test if ((token != NULL) && (token[0] == '\0')) is redundant: token is an array, hence token!= NULL is always true.

  • splitter does not receive the length of the destination array: if the str argument contains a token longer than MAX_TOKEN_SIZE bytes, it will cause undefined behavior because of a buffer overflow.

  • if the delimiter passed to splitter is the null byte, the return value will point beyond the end of the string, potentially causing undefined behavior.

  • the line *token = '\0'; just sets the null terminator at the end of the token copied from str, if any.

Here is a modified version:

#include <stdio.h>
#include <string.h>

#define MAX_TOKEN_SIZE 200

const char *splitter(const char *str, char delimiter, char *token, size_t size) {
    size_t i = 0;
    while (*str) {
        char c = *str  ;
        if (c == delimiter)
            break;
        if (i   1 < size)
            token[i  ] = c;
    }
    if (i < size) {
        token[i] = '\0';  /* set the null terminator */
    }
    return str;
} 

int main() {
    char token[MAX_TOKEN_SIZE   1];  
    const char *env = "/bin:/sbin:::/usr/bin";
    while (*env) {
        env = splitter(env, ':', token, MAX_TOKEN_SIZE   1);  

        // if token is empty, set it to "./"
        if (*token == '\0') {
            strcpy(token, "./");            
        }
        printf("%s\n", token);
    }
    return 0;
}

CodePudding user response:

For starters I will point to a redundant code.

This if statement

if ((token != NULL) && (token[0] == '\0')) {

has a senseless expression because token never can be equal to NULL. token in main is declared as a character array. So you could write

if ( token[0] == '\0') {

Also in the string literal in this statement

strcpy(token, "./\0");

the explicit terminating zero character '\0' is redundant. You can just write

strcpy(token, "./");

As for your question.

The function splitter extracts a sequence of characters until the character delimiter is encountered and stores it in the array token,

while (*str && (delimiter != *str)){
     *token   = *str;
     str  ;            
}

But the result sequence does not represent a string. It shall be ended with the terminating zero character \0 and this statement

*token = '\0'; 

appends the terminating zero character to the end of the extracted sequence stored in the array token.

As for this statement

if (delimiter == *str)
    str  ;

then if it is not the end of the string str (that is if the current character *str is not the terminating zero character '\0'; if it is equal to delimiter then it is not the terminating zero character) then the pointer str is incremented and returned from the function to allow the caller in the next call of the function continue to process the string from the next positions.

So initially you have

 const char *env = "/bin:/sbin:::/usr/bin";

the function copies character /bin appended with the zero character '\0' that is the string "/bin" to the array token. After this call the returned pointer from the function will point to the substring

"/sbin:::/usr/bin"

because the preceding character ':' was skipped by this statement

if (delimiter == *str)
    str  ;

within the function.

  • Related