I've been trying to create a program that reads text from a file and stores it in a string. I feed the string to a function that counts every word in the string.
However its only accurate assuming the user leaves some whitespace at the end of a line and doesn't creates blank lines.... not a very good word counter.
- Creating a blank line results in a false increment to the word count.
I'm not sure if my main problem is using a boolean to do this or checking for whitespace and '\n' characters.
bool countingLetters = false;
int wordCount = 0;
for (int i = 0; i < text.length(); i )
{
if (text[i] == ' ' && countingLetters == true)
{
countingLetters = false;
wordCount ;
}
if (text[i] != ' ' && countingLetters == false)
{
countingLetters = true;
}
if (text[i] == '\n' && countingLetters == true)
{
countingLetters = false;
wordCount ;
}
}
CodePudding user response:
Your code is basically a state machine. To complete your solution, just count in the string ending.
Add this to the end of your code:
if(countingLetters) { // word at the end of string, without any space charactor
wordCount ;
}
Or if you can be sure it's C-style string, like std::string, you can just index 1 pass the last charactor, and handle '\0'
in same way of space and '\n'
.
To improve your code, use isspace (and this covers more space charactor, including '\t'
, etc.). And better to use else if
pattern. Also, it's not good pratice to ==true
. Just use boolean as condition.
Or maybe, isalpha(c)
fits more to your need.
bool countingLetters = false;
int wordCount = 0;
for (char c:text) {
if (!isalpha(c) && countingLetters) { // this also works for newline
countingLetters = false;
wordCount;
} else if (isalpha(c) && !countingLetters) {
countingLetters = true;
} // otherwise just skip
}
if(countingLetters) { // word at the end of string, without any space charactor
wordCount;
}
And it's not acceptable to insert extra charactor just for such a simple task. For example, text may be const.
CodePudding user response:
An alternative is to count the beginning of a "word".
Let us say the beginning of a word is a letter after a non-letter. We can adjust this if desired.
int wordCount = 0;
int prior = '\n'; // some non-letter
for (int i = 0; i < text.length(); i ) {
if (isalpha(text[i]) && !isalpha(prior)) {
wordCount ;
}
prior = text[i];
}
CodePudding user response:
C also provides some very high-level ways to do this.
One is by using a loop over a stringstream, which splits text on whitespace:
#include <sstream>
#include <string>
std::size_t count_words( const std::string& s )
{
std::size_t count = 0;
std::istringstream ss( s );
std::string t;
while (ss >> t) count = 1;
return count;
}
Another is using a stream iterator algorithm:
#include <iterator>
#include <sstream>
#include <string>
std::size_t count_words( const std::string& s )
{
std::istringstream ss( s );
return std::distance(
std::istream_iterator <std::string> ( ss ),
std::istream_iterator <std::string> ()
);
}
Yet another is using a regular expression:
#include <iterator>
#include <regex>
#include <string>
std::size_t count_words( const std::string& s )
{
std::regex re( "\\w " );
return std::distance(
std::sregex_iterator( s.begin(), s.end(), re ),
std::sregex_iterator()
);
}
I’m sure there are many more, but those three are the ones that come off the top of my head.