Home > database >  Demystifying the newline character (again)
Demystifying the newline character (again)

Time:04-26

We know that Windows uses a CR LF pair as its new line, Unix (including Linux and OS X) uses a single LF, while MacOS uses a single CR.

Does that mean that the interpretation of a newline in C and C depends upon the execution environment, even though K&R (section 1.5.3 Line Counting) states the following very categorically?

so '\n' stands for the value of the newline character, which is 10 in ASCII.

CodePudding user response:

We know that Windows uses a CR LF pair as its new line,…

The page you link to does not say Windows uses “CR LF” as its new line character. It says Windows marks the end of a line in a text file with a carriage-return character and a line-feed character. That does not mean those characters are a new-line character or vice-versa.

Does that mean that the interpretation of a newline…

The new-line character is a new-line character. In C, it is intended to mark a new line. When ASCII is used, ASCII’s line-feed character (code 10) is typically used as C’s new-line character ('\n').

If a C program reads a Windows-style text file using a binary stream, it will see a carriage-return character and a line-feed marking the ends of lines. If a C program reads a Windows-style text file using a text stream (in an environment that supports this), the Windows line-ending indications (carriage-return character and line-feed character) will be automatically translated to C new-line characters.

Conversely, if a C program writes to a Windows-style text file using a text stream, the new-line characters it writes will be translated to Windows line-ending indications. If it writes using a binary stream, it must write the carriage-return characters and the line-feed characters itself.

CodePudding user response:

Does that mean that the interpretation of a newline in C and C depends upon the execution environment

No, it does not depend. The interpretation depends the tool that reading the file which is platform suggested but can differ. A robust text tool will tolerate various encodings and will handle change.

Further, text files originating on one system are accessed/edited by other planforms with different rules.

CodePudding user response:

No, \n always means LF.

On Windows there is LF <-> CR-LF conversion that's performed by the IO streams (FILE *, std::??stream), if the stream is opened in text mode (as opposed to binary mode).

CodePudding user response:

Does that mean that the interpretation of a newline in C and C depends upon the execution environment?

The interpretation of the file contents does indeed depend on the execution environment, so that the C programmer does not have to handle the different conventions explicitly:

  • if the stream is open as binary "rb", no translation is performed and each byte of the file contents is returned directly by getchar(). Unix systems handle text files and binary files identically, so no translation occurs for text files either.

  • on other systems, streams open in text mode "rt" or just "r" are handled in a system specific way to translate line ending patterns to the single byte '\n', which in ASCII has the value 10. On Windows and MS/DOS systems, this translation converts CR/LF pairs to single bytes '\n', which can be implemented as simply removing CR bytes. This convention was inherited from previous microcomputer operating systems such as Gary Kildall's CP/M, whose APIs were emulated in QDOS, Seattle Computer Products' original 8086 OS that later became MS/DOS.

  • older Mac systems (before OS/X) used to represent line endings with a single CR byte, but Apple changed this when they adopted a Unix kernel for their OS/X system. No translation is performed anymore on macOS.

  • Antique systems used to have even more cumbersome representations for text files, such as fixed length records and the stream implementation was inserting extra '\n' bytes to simulate unix line endings when read such streams in text mode.

It is important to understand that this translation process is system specific and is not designed to handle files copied from other systems that use a different convention. Advanced Text tools such as the QEmacs programmers' editor can detect different line endings and perform the appropriate translation regardless of the current execution environment, preserving the convention used in the file, or converting it to another convention under user control.

  •  Tags:  
  • c c
  • Related