Home > Back-end >  Why do cin and getline exhibit different reading behavior?
Why do cin and getline exhibit different reading behavior?

Time:11-28

For reference I have already looked at Why does std::getline() skip input after a formatted extraction?

I want to understand cin and getline behavior. I am imagining cin and getline to be implemented with a loop over the input buffer, each iteration incrementing a cursor. Once the current element of the input buffer equals some "stopping" value (" " or "\n" for cin, "\n" for getline), the loop breaks.

The question I have is the difference between the reading behavior of cin and getline. With cin, it seems to stop at "\n", but it will increment the cursor before breaking from the loop. For example,

string a, b;
cin >> a;
cin >> b;
cout << a << "-" << b << endl;
// Input: "cat\nhat"
// Output: "cat-hat"

So in the above code, the first cin read up until the "\n". once it hit that "\n", it increments the cursor to the next position "h" before breaking the loop. Then, the next cin operation starts reading from "h". This allows the next cin to actually process characters instead of just breaking.

When getline is mixed with cin, this is not the behavior.

string a, b;
cin >> a;
getline(cin, b);
cout << a << "-" << b << endl;

// Input: "cat\nhat"
// Output: "cat-"

In this example, the cin reads up to the "\n". But when getline starts reading, it seems to be reading from the "\n" instead of the "h". This means that the cursor did not advance to "h". So the getline processed the "\n" and advances the cursor to the "h" but does not actually save the getline to "b".

So in one example, cin seems to advance the cursor at "\n", whereas in another example, it does not. getline also exhibits different behaviors. For example

string a, b;
getline(cin, a);
getline(cin, b);
cout << a << "-" << b << endl;

// Input: "cat\nhat"
// Output: "cat-hat"

Now getline actually advances the cursor on the "\n". Why is there different behavior and what is the actual implementation of cin vs getline when it comes to delimeter characters?

CodePudding user response:

reading behavior of cin and getline.

cin does not "read" anything. cin is an input stream. cin is getting read from. getline reads from an input stream. The formatted extraction operator, >>, reads from an input stream. What's doing the reading is >> and std::getline. std::cin does no reading of its own. It's what's being read from.

first cin read up until the "\n". once it hit that "\n", it increments the cursor to the next position

No it doesn't. The first >> operator reads up until the \n, but does not read it. \n remains unread.

The second >> operator starts reading with the newline character. The >> operator skips all whitespace in the input stream before it extracts the expected value.

The detail that you're missing is that >> skips whitespace (if there is any) before it extracts the value from the input stream, and not after.

Now, it is certainly possible that >> finds no whitespace in the input stream before extracting the formatted value. If >> is tasked with extracting an int, and the input stream has just been opened and it's at the beginning of the file, and the first character in the file is a 1, well, the >> just doesn't skip any whitespace at all.

Finally, std::getline does not skip any whitespace, it just reads from the input stream until it reads a \n (or reaching the end of the input stream).

CodePudding user response:

tl;dr: it's because how std::cin is intra-line-oriented while getline is line-oriented.

Historically, in C's standard library, we had the functions scanf() and getline():

  • When you tell scanf() to expect a string, it

    ... stops at white space or at the maximum field width, whichever occurs first.

    and more generally,

    Most conversions [e.g. readings of strings] discard initial white space characters

    (from the scanf() man page)

  • When you call getline(), it:

    reads an entire line ... the buffer containing the text ... includes the newline character, if one was found.

    (from the getline() man page)

Now, C 's std::cin mechanism replaced scanf() for formatted input matching, but with type safety. (Actually std::cin and std::cout are quite problematic as replacements, but never mind that now.) As a substitute for scanf(), it inherits many of its features, including being averse to picking up white space.

Thus, just like scanf(), running std::cin >> a for a string a will stop before a \n character, and keep that line break in the input stream for future use. Also, just like scanf(), std::cin's >> operator skips leading whitespace, so if you use it a second time, the \n will be skipped, and the next string picked up starting from the next line's first non-whitespace character.

With std::getline(), you get the exact same getline() behavior of decades past.


PS - you can control the whitespace-skipping behavior using the skipws format-flag of std::cin

  • Related