Home > Software engineering >  strtok() only printing first word rest are (null)
strtok() only printing first word rest are (null)

Time:03-03

I am trying to parse a large text file and split it up into single words using strtok. The delimiters remove all special characters, whitespace, and new lines. For some reason when I printf() it, it only prints the first word and a bunch of (null) for the rest.

    ifstream textstream(textFile);
    string textLine;
    while (getline(textstream, textLine))
    {
        struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX]  = textLine.length()   1;
        char *line_c = new char[textLine.length()   1]; // creates a character array the length of the line
        strcpy(line_c, textLine.c_str());               // copies the line string into the character array
        char *word = strtok(line_c, delimiters);        // removes all unwanted characters
        while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
        {
            MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
            word = strtok(NULL, delimiters);                                            // move to next word
            printf("%s", word);
        }
    }

CodePudding user response:

Rather than jumping through the hoops necessary to use strtok, I'd write a little replacement that works directly with strings, without modifying its input, something on this general order:

std::vector<std::string> tokenize(std::string const &input, std::string const &delims = " ") {
    std::vector<std::string> ret;
    int start = 0;

    while ((start = input.find_first_not_of(delims, start)) != std::string::npos) {
        auto stop = input.find_first_of(delims, start 1);
        ret.push_back(input.substr(start, stop-start));
        start = stop;
    }
    return ret;
}

At least to me, this seems to simplify the rest of the code quite a bit:

std::string textLine;
while (std::getline(textStream, textLine)) {
    struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX]  = textLine.length()   1;
    auto words = tokenize(textLine, delims);
    for (auto const &word : words) {
        MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n';
        std::cout << word << '\n';
    }
}

This also avoids (among other things) the massive memory leak you had, allocating memory every iteration of your loop, but never freeing any of it.

CodePudding user response:

Move printf two lines UP.

while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
    printf("%s", word);
    MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
    word = strtok(NULL, delimiters);                                            // move to next word

}
  • Related