Home > Net >  Why are two getchar() functions used?
Why are two getchar() functions used?

Time:11-04

I'm trying to write a program which task is to get rid of extra whitespace in a text stream.

#include <stdio.h>

int main()
{
    int c;
    
    while( (c = getchar()) != EOF)
    {
        if(c == ' ')
        {
            putchar(c);
            while( (c = getchar()) == ' ' );
            
            if(c == EOF) break;
        }
        putchar(c);
    }
}

I know that getchar() function returns a new character each iteration and we store the character in the c variable, but I can't figure out why do we need to assign c variable to getchar() in the second while loop again.

CodePudding user response:

The inner loop loops as long as it extracts spaces from the stream, effectively skipping them. The result is that, if you have a sequence of spaces, the first one will be printed, and the rest will be discarded.

You need to assign to c even there (as opposed to just doing something like while (getchar() == ' ');) because you'll reach a point where you extract a character that is not a space, so you need to remember what character it was to output it after the loop.

CodePudding user response:

I know that getchar() function returns a new character each iteration and we store the character in the c variable, but I can't figure out why do we need to assign c variable to getchar() in the second while loop again.

The reuse of variable c is of no particular consequence. A separate variable could have been used for the inner loop instead. Example:

#include <stdio.h>

int main()
{
    int c;
    
    while( (c = getchar()) != EOF)
    {
        putchar(c);
        if(c == ' ')
        {
            int c2;
            while( (c2 = getchar()) == ' ' );
            
            if (c2 == EOF) break;
            putchar(c2);
        }
    }
}

Note that a rearrangement of the putchar() calls was required. Either way, when the program encounters a run of spaces, it must both (i) print one space, AND (ii) avoid losing the first non-space, if any, after the run.

It would also have been possible to write the program with only one appearance of a call to getchar(), but then it would need to store more state explicitly (whether the previous character read was a space) and use that to decide whether to output each newly-read character. For example:

#include <stdio.h>

int main(void) {
    int c;
    _Bool was_space = 0;
    
    while ((c = getchar()) != EOF) {
        if (c != ' ' || !was_space) {
            putchar(c);
        }
        was_space = (c == ' ');
    }
}

In fact, I find both of those alternatives rather clearer than the original. The original seems to go out of its way to minimize the number of variable it declares.

  • Related