char buff[1];
int main(){
int c;
c=getchar();
printf("%d\n",c); //output -1
c=getchar();
printf("%d\n",c); // output -1
int res;
//here I get a prompt for input. What happened to EOF ?
while( (res=read(0,buff,1)) > 0){
printf("Hello\n");
}
while( (res=read(0,buff,1)) > 0){
printf("Hello\n");
}
return 0;
}
The resulting output showed with commented lines in the code is the result of simply typing ctrD (EOF on mac).
I'm a bit confused about the behaviour of getchar(), especially when compared to read.
Shouldn't the read system calls inside the while loop also return EOF ? Why do they prompt the user ? Has some sort of stdin clear occurred ?
Considering that getchar() uses the read system call under the hood how come they behave differently ? Shouldn't the stdin be "unique" and the EOF condition shared?
How come in the following code the two read system calls return both EOF when a ctrD input is given?
int res;
while( (res=read(0,buff,1)) > 0){
printf("Hello\n");
}
while( (res=read(0,buff,1)) > 0){
printf("Hello\n");
}
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
P.S I'm using a Mac OS machine
CodePudding user response:
Once the end-of-file indicator is set for stdin
, getchar()
does not attempt to read.
Clear the end-of-file indicator (e.g. clearerr()
or others) to re-try reading.
The
getchar
function is equivalent togetc
with the argumentstdin
.
The
getc
function is equivalent tofgetc
...
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the
fgetc
function obtains that character as anunsigned char
converted to anint
and advances the associated file position indicator for the stream (if defined).
read()
does still tries to read each time.
Note: Reading via a FILE *
, like stdin
, does not attempt to read if the end-of-file indicator is set. Yet even if the error indicator is set, a read attempt still occurs.
CodePudding user response:
MacOs is a derivative of BSD unix systems. Its stdio implementation does not come from GNU software and so it is a different implementation. On EOF, the file descriptor is marked as erroneous when issuing a read(2)
system call and receiving 0 as the number of characters returned by read, and so, it doesn't read(2)
it again until the error condition is reset, and this produces the behaviour you observe. Use clearerr(stream);
on the FILE *
descriptor before issuing the next getchar(3)
call, and everything will be fine. You can do that with glib also, and then, your program will run the same in either implementation of stdio (glib vs. bsd)
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
EOF
is simply a constant (normally it's valued as -1) that is different to any possible char
value returned by getchar(3)
(getchar()
returns an int
in the interval 0..255, and not a char for this purpose, to extend the range os possible characters with one more to represent the EOF condition, but EOF is not a char) The end of file condition is so indicated by the getchar family of functions (getchar, fgetc, etc) as the end of file condition is signalled by a read(2)
return value of 0 (the number of returned characters is zero) which doesn't map as some character.... for that reason, the number of possible chars is extended to an integer and a new value EOF
is defined to be returned when the end of file condition is reached. This is compatible with files that have Ctrl-D characters (ASCII EOT or Cntrl-D, decimal value 4) and not representing an END OF FILE condition (when you read an ASCII EOT from a file it appears as a normal character of decimal value 4)
The unix tty implementation, on the other side, allows on line input mode to use a special character (Ctrl-D, ASCII EOT/END OF TRANSMISSION, decimal value 4) to indicate and end of stream to the driver.... this is a special character, like ASCII CR or ASCII DEL (that produce line editing in input before feeding it to the program) in that case the terminal just prepares all the input characters and allows the application to read them (if there's none, none is read, and you got the end of file) So think that the Cntrl-D is only special in the unix tty driver and only when it is working in canonical mode (line input mode). So, finally, there are only two ways to input data to the program in line mode:
- pressing the RETURN key (this is mapped by the terminal into ASCII CR, which the terminal translates into ASCII LF, the famous
'\n'
character) and the ASCII LF character is input to the program - pressing the Ctrl-D key. this makes the terminal to grab all that was keyed in upto this moment and send it to the program (without adding the Ctrl-D itself) and no character is added to the input buffer, what means that, if the input buffer was empty, nothing is sent to the program and the
read(2)
call reads effectively zero characters from the buffer.
To unify, in every scenario, the read(2)
system call normally blocks into the kernel until one or more characters are available.... only at end of file, it unblocks and returns zero characters to the program. THIS SHOULD BE YOUR END OF FILE INDICATION. Many programs read an incomplete buffer (less than the number of characters you passed as parameter) before a true END OF FILE is signalled, and so, almost every program does another read to check if that was an incomplete read or indeed it was an end of file indication.
Finally, what if I want to input a Cntrl-D character as itself to a file.... there's another special character in the tty implementation that allows you to escape the special behaviour on the special character this one precedes. In today systems, that character is by default Ctrl-V, so if you want to enter a special character (even ?Ctrl-V) you have to precede it with Ctrl-V, making entering Ctrl-D into the file to have to input Ctrl-V Ctrl-D.