Previous question was : what am I doing wrong that strtok does right in splitting a string. Also separating the strtok to a function suddenly doesn't produce correct result?
This is the first time that I ask a question in stackoverflow so forgive me if this is wordy and incoherent. The last part of the question is elaborated at the bottom part of this question body.
So, I was doing a course assessment assigned by my college, in that, one question is :
Remove duplicate words and print only unique words
Input : A single sentence in which each word separated by a space
Output : Unique words separated by a space [Order of words should be same as in input]
Example:
Input : adc aftrr adc art
Output : adc aftrr art
Now, I have the solution which is to split the string on whitespaces and adding the word to a array(set) if it is not already exists, but it is the implementation part that makes me to plug my hair out
#include <stdio.h>
#include <string.h>
#define MAX 20
int exists(char words[][MAX], int n, char *word){ // The existence check function
for(int i=0;i < n;i )
if(strcmp(words[i],word) == 0)
return 1;
return 0;
}
void removeDuplicateOld(char*);
void removeDuplicateNew(char*);
int main(){
char sentence[MAX*50] = {0}; //arbitary length
fgets(sentence,MAX*50,stdin);
sentence[strcspn(sentence,"\n")]=0;
printf("The Old One : \n");
removeDuplicateOld(sentence);
printf("\nThe New One : \n");
removeDuplicateNew(sentence);
}
The fucntion that uses strtok to split string :
void removeDuplicateNew(char *sentence){
char words[10][MAX] = {0};
int wi=0;
char *token = strtok(sentence," ");
while(token != NULL){
if(exists(words,wi,token)==0) {
strcpy(words[wi ],token);
}
token = strtok(NULL," ");
}
for(int i=0;i<wi;i ) printf("%s ",words[i]);
}
The old function that uses my old method (which is constructing a word until I hit whitespace) :
void removeDuplicateOld(char *sentence){
char objects[10][MAX] = {0}; //10 words with MAX letters
char tword[MAX];
int oi=0, si=0, ti=0;
while(sentence[si]!='\0'){
if(sentence[si] != ' ' && sentence[si 1] != '\0')
tword[ti ] = sentence[si];
else{
if(sentence[si 1] == '\0')
tword[ti ]=sentence[si];
tword[ti]='\0';
if(exists(objects,oi,tword) == 0){
strcpy(objects[oi ],tword);
}
ti=0; // The buffer[tword] is made to be overwritten
}
si ;
}
for(int i=0;i<oi;i )
printf("%s ",objects[i]);
}
Solved : changed if(sentence[si 1] == '\0') to if(sentence[si 1] == '\0' && sentence[si]!=' ')
Here is the output :
input : abc def ghi abc jkl ghi
The Old One :
abc def ghi jkl
The New One :
abc def ghi jkl
Note trailing whitespaces in input and output is not checked as their own driver code doesn't properly handle them while strtok method does and it passes all tests.
Now both methods seems to be producing same results but they are indeed producing different outputs according to test cases and in top of that separating strtok method as a separate function[removeDuplicateNew] fails one test case while writing it in main method itself passes all test, see these results :
Strtok Method as Separate Function Test Case Results
Following Was Moved To A separate Question Thread
When Coded in main method itself :
int main(){
char sentence[MAX*50] = {0}; //arbitary length
fgets(sentence,MAX*50,stdin);
sentence[strcspn(sentence,"\n")] = 0;
char words[10][MAX] = {0};
int wi=0;
char *token = strtok(sentence," ");
while(token != NULL){
if(exists(words,wi,token)==0) {
strcpy(words[wi ],token);
}
token = strtok(NULL," ");
}
for(int i=0;i<wi;i ) printf("%s ",words[i]);
}
Strtok Method as inline code Test Case Results
For the record, it is the same code just placed in main method, so what the heck happens here that when I separate it as a function and pass the string as argument it suddenly isn't working properly.
Also any advice on my question building, wording is appreciated.
CodePudding user response:
Your code...
void removeDuplicateOld(char *sentence){
char objects[10][MAX] = {0}; //10 words with MAX letters
char tword[MAX];
int oi=0, si=0, ti=0;
while(sentence[si]!='\0'){
if(sentence[si] != ' ' && sentence[si 1] != '\0')
tword[ti ] = sentence[si];
else{
// right here have hit SP.
// if SP followed by '\0'
// then append SP to my word... wrong! <=====
if(sentence[si 1] == '\0')
tword[ti ]=sentence[si];
tword[ti]='\0';
This is why the library function strtok()
works better than hand rolled
code.
It has been tested and proven to work as it says it does.
There's a better way to use strtok()
for( char *p = sentence; (p = strtok( p, " \n") ) != NULL; p = NULL )
if( exists( words, wi, p ) == 0 )
strcpy( words[wi ], p );
That's all you need. strtok() will even trim the LF off the buffer for you, no extra charge.
Final suggestion: Instead of a fixed-sized array of pointers to words, you might consider a linked-list (LL) that can easily grow. The function that would append a new word to the end of the list can quietly eat the word if it turns out to be a duplicate found while traversing to append to the end of the LL.