Word counter returning incorrect number of words-CodePudding

I've been trying to create a program that reads text from a file and stores it in a string. I feed the string to a function that counts every word in the string.

However its only accurate assuming the user leaves some whitespace at the end of a line and doesn't creates blank lines.... not a very good word counter.

Creating a blank line results in a false increment to the word count.

I'm not sure if my main problem is using a boolean to do this or checking for whitespace and '\n' characters.

bool countingLetters = false;
int wordCount = 0;
for (int i = 0; i < text.length(); i  )
{
    if (text[i] == ' ' && countingLetters == true)
    {
        countingLetters = false;
        wordCount  ;
    }
    if (text[i] != ' ' && countingLetters == false)
    {
        countingLetters = true;
    }
    if (text[i] == '\n' && countingLetters == true)
    {
        countingLetters = false;
        wordCount  ;
    }
}

CodePudding user response：

Your code is basically a state machine. To complete your solution, just count in the string ending.

Add this to the end of your code:

if(countingLetters) { // word at the end of string, without any space charactor
   wordCount  ;
}

Or if you can be sure it's C-style string, like std::string, you can just index 1 pass the last charactor, and handle '\0'in same way of space and '\n' .

To improve your code, use isspace (and this covers more space charactor, including '\t', etc.). And better to use else if pattern. Also, it's not good pratice to ==true. Just use boolean as condition.

Or maybe, isalpha(c) fits more to your need.

bool countingLetters = false;
int wordCount = 0;
for (char c:text) {
    if (!isalpha(c) && countingLetters) { // this also works for newline
        countingLetters = false;
          wordCount;
    } else if (isalpha(c) && !countingLetters) {
        countingLetters = true;
    } // otherwise just skip
}
if(countingLetters) { // word at the end of string, without any space charactor
     wordCount;
}

And it's not acceptable to insert extra charactor just for such a simple task. For example, text may be const.

CodePudding user response：

An alternative is to count the beginning of a "word".

Let us say the beginning of a word is a letter after a non-letter. We can adjust this if desired.

int wordCount = 0;
int prior = '\n';  // some non-letter
for (int i = 0; i < text.length(); i  ) {
  if (isalpha(text[i]) && !isalpha(prior)) {
    wordCount  ;
  }
  prior = text[i];
}

CodePudding user response：

C also provides some very high-level ways to do this.

One is by using a loop over a stringstream, which splits text on whitespace:

#include <sstream>
#include <string>

std::size_t count_words( const std::string& s )
{
  std::size_t count = 0;
  std::istringstream ss( s );
  std::string t;
  while (ss >> t) count  = 1;
  return count;
}

Another is using a stream iterator algorithm:

#include <iterator>
#include <sstream>
#include <string>

std::size_t count_words( const std::string& s )
{
  std::istringstream ss( s );
  return std::distance( 
    std::istream_iterator <std::string> ( ss ), 
    std::istream_iterator <std::string> ()
  );
}

Yet another is using a regular expression:

#include <iterator>
#include <regex>
#include <string>

std::size_t count_words( const std::string& s )
{
  std::regex re( "\\w " );
  return std::distance(
    std::sregex_iterator( s.begin(), s.end(), re ),
    std::sregex_iterator()
  );
}

I’m sure there are many more, but those three are the ones that come off the top of my head.