I'm trying to build a function that counts how many sentences there are in a text depending on the ?
, !
, and .
to decide the end of the sentence.
For some reason no matter how many of them there are the function doesn't count more than one and only if it was at the end of the whole text.
This is the function:
int count_sentences(string text) {
int count = 0;
for (int i = 0; i < strlen(text); i ) {
if (strcmp(&text[i], "?") == 0 || strcmp(&text[i], "!") == 0
|| strcmp(&text[i], ".") == 0) {
count = 1;
}
}
return count;
}
CodePudding user response:
strcmp
is for comparing null terminated strings. You just want to compare characters.
Let's assume text
contains ".ABC"
.
During the first iteration (when i
us 0), &text[i]
points to the string ".ABC"
and strcmp(&text[i], ".")
actually compares the string ".ABC"
to the string "."
and they are, of course, not equal.
Your if statement should be like this:
if ((text[i] == '?') || text[i] == '!') || text[i] == '.'))
CodePudding user response:
Using the function strcmp
if (strcmp(&text[i], "?") == 0 || strcmp(&text[i], "!") == 0 || strcmp(&text[i], ".") == 0)
does not make a sense. The expression in the if statement will be evaluated to true only for the last character in the string text
provided that the last character is equal to one of the characters '?'
, '!'
and '.'
.
Instead you should use standard C functions strspn
and strcspn
. For example
size_t count_sentences( string text )
{
const char *delim = "?!.";
size_t count = 0;
for ( text = strcspn( text, delim ); *text != '\0'; text = strcspn( text, delim ) )
{
count;
text = strspn( text, delim );
}
return count;
}
The function will return 1 for example for the string "Executing..."
. Indeed there is only one statement though there are three characters '.'
.
Here is a demonstration program.
#include <stdio.h>
#include <string.h>
typedef char *string;
size_t count_sentences( string text )
{
const char *delim = "?!.";
size_t count = 0;
for ( text = strcspn( text, delim ); *text != '\0'; text = strcspn( text, delim ) )
{
count;
text = strspn( text, delim );
}
return count;
}
int main(void)
{
string text = "Strange... Why are you using strcmp?! Use strspn and strcspn!!!";
printf( "The text\n\"%s\"\ncontains %zu sentences.\n",
text, count_sentences( text ) );
return 0;
}
The program output is
The text
"Strange... Why are you using strcmp?! Use strspn and strcspn!!!"
contains 3 sentences.
CodePudding user response:
Your code does not work because &text[i]
is not a 1 character string, but a pointer to the part of the string in text
starting at offset i
. Only the last character will be tested correctly as you observe.
Instead of strings, you should instead individual characters this way:
int count_sentences(const char *text) {
int count = 0;
for (int i = 0; text[i] != '\0'; i ) {
if (text[i] == '?' || text[i] == '!' || text[i] == '.') {
count = 1;
}
}
return count;
}
Note however that this code will not work as expected for strings that do not end with a sentence separator, neither if multiple separators occur together: "hello"
, "Hello!!!"
, "Sorry..."
.
The trick to count such sentences is to enumerate transitions from separators to non separators. This method works to count words, lines etc.
Here is a modified version:
int isterminator(int c) {
return (c == '?' || c == '!' || c == '.');
}
int count_sentences(const char *text) {
char last = '.';
int count = 0;
for (size_t i = 0; text[i] != '\0'; i ) {
if (isterminator(last) && !isterminator(text[i])) {
count = 1;
}
last = text[i];
}
return count;
}
CodePudding user response:
int count_sentences(string text)
{
int count = 0;
for (int i = 0; i < text.size(); i) {
if (text[i] == '?' || text[i] == '!' || text[i] == '.')
count;
}
return count;
}
int main()
{
string sentence = "abc. def? ghi!";
cout << count_sentences(sentence) << endl;
}
CodePudding user response:
The pointer you passed in causes you to compare the following characters. Use string[index] == '?'
to compare individual characters.
/* Compare S1 and S2, returning less than, equal to or
greater than zero if S1 is lexicographically less than,
equal to or greater than S2. */
int[enter link description here][1]
STRCMP (const char *p1, const char *p2)
{
const unsigned char *s1 = (const unsigned char *) p1;
const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char) *s1 ;
c2 = (unsigned char) *s2 ;
if (c1 == '\0')
return c1 - c2;
}
while (c1 == c2);
return c1 - c2;
}