My code needs to use a lot of fgetc(inp)
.
It doesn't have any problem in windows, but in macOS the program will error out.
I found that the problem is caused by the inconsistency of the number of characters in the newline character in both systems:
macOS just \n
, windows is \r\n
So I created a new function to replace fgetc(inp)
which reads newline characters
void getwhite() {
int white = fgetc(inp);
if (isspace(white) == 0) {
fseek(inp, -1, SEEK_CUR);
}
}
But it doesn't work as expected, still works fine in windows, macOS still gives errors
CodePudding user response:
Instead of fseek()
, you should use ungetc()
to push back the byte read from the stream:
int getwhite() {
int c = fgetc(inp);
if (!isspace(c)) {
ungetc(c, inp);
}
return c;
}
Regarding the handling of line endings on windows and other systems: for legacy reasons, windows still uses CR LR sequences to indicate end of lines in text files and the C library translates these sequences transparently into a single '\n'
byte for programs that read files as text, either with fopen()
or lower level open()
interfaces.
This makes file offsets tricky to use because the number of bytes read from the file may be different from the offset in bytes into the file, which cannot be retrieved with standard functions: the long
returned by ftell()
for streams open in text mode is only meaningful as a number to pass to fseek()
for SEEK_SET
mode on the same file open in text mode. Seeking with a non zero offset in SEEK_CUR
and SEEK_END
modes on a text stream has undefined behavior, as specified in the C Standard:
7.21.9.2 The
fseek
function
Synopsis#include int fseek(FILE *stream, long int offset, int whence);
Description
[...]For a text stream, either
offset
shall be zero, oroffset
shall be a value returned by an earlier successful call to theftell
function on a stream associated with the same file andwhence
shall beSEEK_SET
.
If you need to rely on file offsets, you should open files in binary mode and handle the line endings explicitly in your own code.
Apple operating systems used to represent line endings as a single CR byte, but switched to a single NL byte more than 10 years ago, when they adopted the Mach unix compatible kernel.