Home > Software engineering >  what am I doing wrong that strtok does right in splitting a string. Also separating the strtok to a
what am I doing wrong that strtok does right in splitting a string. Also separating the strtok to a

Time:10-03

This is the first time that I ask a question in stackoverflow so forgive me if this is wordy and incoherent. The last part of the question is elaborated at the bottom part of this question body.

So, I was doing a course assessment assigned by my college, in that, one question is :

Remove duplicate words and print only unique words

Input : A single sentence in which each word separated by a space

Output : Unique words separated by a space [Order of words should be same as in input]

Example:

Input : adc aftrr adc art

Output : adc aftrr art

Now, I have the solution which is to split the string on whitespaces and adding the word to a array(set) if it is not already exists, but it is the implementation part that makes me to plug my hair out

#include <stdio.h>
#include <string.h>

#define MAX 20

int exists(char words[][MAX], int n, char *word){ // The existence check function
    for(int i=0;i < n;i  )
        if(strcmp(words[i],word) == 0) 
            return 1;
    return 0;
}

void removeDuplicateOld(char*);
void removeDuplicateNew(char*);

int main(){
    char sentence[MAX*50] = {0}; //arbitary length
    fgets(sentence,MAX*50,stdin);
    sentence[strcspn(sentence,"\n")]=0;
    
    printf("The Old One : \n");
    removeDuplicateOld(sentence);
    printf("\nThe New One : \n");
    removeDuplicateNew(sentence);
}

The fucntion that uses strtok to split string :


void removeDuplicateNew(char *sentence){
    char words[10][MAX] = {0};
    int wi=0;
    char *token = strtok(sentence," ");
    
    while(token != NULL){
        if(exists(words,wi,token)==0) {
            strcpy(words[wi  ],token);
        }
        token = strtok(NULL," ");
    }
    for(int i=0;i<wi;i  ) printf("%s ",words[i]);
}

The old function that uses my old method (which is constructing a word until I hit whitespace) :


void removeDuplicateOld(char *sentence){
    char objects[10][MAX] = {0}; //10 words with MAX letters
    char tword[MAX];
    int oi=0, si=0, ti=0;
    
    while(sentence[si]!='\0'){
        if(sentence[si] != ' ' && sentence[si 1] != '\0')
            tword[ti  ] = sentence[si];
        else{
            if(sentence[si 1] == '\0')
                tword[ti  ]=sentence[si];
                
            tword[ti]='\0';
            
            if(exists(objects,oi,tword) == 0){
                strcpy(objects[oi  ],tword);
            }
            ti=0; // The buffer[tword] is made to be overwritten

        }
        si  ;
    }
    for(int i=0;i<oi;i  )
        printf("%s ",objects[i]);
}

Here is the output :

input : abc def ghi abc jkl ghi

The Old One :

abc def ghi jkl

The New One :

abc def ghi jkl

Note trailing whitespaces in input and output is not checked as their own driver code doesn't properly handle them while strtok method does and it passes all tests.


Now both methods seems to be producing same results but they are indeed producing different outputs according to test cases and in top of that separating strtok method as a separate function[removeDuplicateNew] fails one test case while writing it in main method itself passes all test, see these results :

Old Method Test Case results

Strtok Method as Separate Function Test Case Results

When Coded in main method itself :

int main(){
    char sentence[MAX*50] = {0}; //arbitary length
    fgets(sentence,MAX*50,stdin);
    sentence[strcspn(sentence,"\n")] = 0;
    
    char words[10][MAX] = {0};
    int wi=0;
    char *token = strtok(sentence," ");
    
    while(token != NULL){
        if(exists(words,wi,token)==0) {
            strcpy(words[wi  ],token);
        }
        token = strtok(NULL," ");
    }
    for(int i=0;i<wi;i  ) printf("%s ",words[i]);
}

Strtok Method as inline code Test Case Results

For the record, it is the same code just placed in main method, so what the heck happens here that when I separate it as a function and pass the string as argument it suddenly isn't working properly.

Also any advice on my question building, wording is appreciated.

CodePudding user response:

Your code...

void removeDuplicateOld(char *sentence){
    char objects[10][MAX] = {0}; //10 words with MAX letters
    char tword[MAX];
    int oi=0, si=0, ti=0;
    
    while(sentence[si]!='\0'){
        if(sentence[si] != ' ' && sentence[si 1] != '\0')
            tword[ti  ] = sentence[si];
        else{
            // right here have hit SP.
            // if SP followed by '\0'
            // then append SP to my word... wrong! <=====
            if(sentence[si 1] == '\0')
                tword[ti  ]=sentence[si];
                
            tword[ti]='\0';

This is why the library function strtok() works better than hand rolled code.
It has been tested and proven to work as it says it does.


There's a better way to use strtok()

for( char *p = sentence; (p = strtok( p, " \n") ) != NULL; p = NULL )
    if( exists( words, wi, p ) == 0 )
        strcpy( words[wi  ], p );

That's all you need. strtok() will even trim the LF off the buffer for you, no extra charge.


Final suggestion: Instead of a fixed-sized array of pointers to words, you might consider a linked-list (LL) that can easily grow. The function that would append a new word to the end of the list can quietly eat the word if it turns out to be a duplicate found while traversing to append to the end of the LL.

  • Related