It is my understanding that Unix defines a "line" as a sequence of zero of more characters followed by a newline. Do I understand correctly?
The last line is a "line" (of course) so the last line must have a newline. Is that correct?
Suppose there is a sequence of characters, a newline, and then a sequence of characters. That is, no newline after the last sequence of characters. What does that mean? Does it mean that it is bad/invalid data? What does the Unix Philosophy say that a tool should do with such data? Reject it? Process all lines and ignore the last sequence of characters? Something else?
CodePudding user response:
Here's a handful of examples from Linux:
$ printf 'line\neof' > y
$ cat y
line
eof$
$ wc -l y
1 y
$ grep eof y
eof
$ tac y
eofline
$ rev y
enil
foe$ sort y
eof
line
$ tail -n 1 y
eof$ sed -n 1p y
line
$ sed -n 2p y
eof$
As you can see, the behavior isn't consistent:
cat
andwc
are very literal and don't add any missing newlinegrep
andsort
add a newlinerev
,sed
andtail
consider the last line but don't add a newlinetac
just gets confused
But you'll also note:
- None of those programs treat it as invalid data.
- None of these programs ignore the part after the last newline.
- For the most part, these programs will work as the user would expect them to work if piped together.
So if there's any "Unix philosophy" takeaway here, it's less about newlines and more about input handling as noted above.