How can I understand the behavior of pipe, with varying data flow?-CodePudding

My problem is a bit hard to explain properly as I do not understand fully the behavior behind it. I have been working on pipe and pipelines in C, and I noticed some behavior that is a bit mysterious to me.

Let's take a few example: Let's try to pipe yes with head. (yes | head). Even though I coded the behavior in a custom program, I don't understand how the pipe knows when to stop piping ? It seems two underlying phenomenons are causing this (maybe), the SIGPIPE and/or the internal size a pipe can take. How does the pipe stop piping, is it when it's full ? But the size of a pipe is way superior to 10 "yes\n" no ? And SIGPIPE only works when the end read/write is closed no ?

Also let's take another example, for example cat and ls: cat | ls or even cat | cat | ls. It seems the stdin of the pipe is waiting for input, but how does it know when to stop, i.e. after one input ? What are the mechanism that permits this behavior?

Also can anyone provide me with others examples of these very specific behavior if there are any in pipes and pipelines so I can get an good overview of theses mechanism ?

In my own implementation, I managed to replicate that behavior using waitpid. However how does the child process itself know when to stop ? Is it command specific ?

CodePudding user response：

The write operation will block when the pipe buffer is full, the read operation will block when the buffer is empty.

When the write end of the pipe is closed, the reading process will get an EOF indication after reading all data from the buffer. Many programs will terminate in this case.

When the read end of the pipe is closed, the writing process will get a SIGPIPE. This will also terminate most programs.

When you run cat | ls, STDOUT of cat is connected to STDIN of ls, but ls does not read from STDIN. On the system where I checked this, ls simply ignores STDIN and the file descriptor will be closed when ls terminates.

You will see the output of ls, and cat will be waiting for input.

cat will not write anything to STDOUT before it has read enough data from STDIN, so it will not notice that the other end of the pipe has been closed.

cat will terminate when it detects EOF on STDIN which can be done by pressing CTRL D or by redirecting STDIN from /dev/null, or when it gets SIGPIPE after trying to write to the pipe which will happen when you (type something and) press ENTER.

You can see the behavior with strace.

cat terminates after EOF on input which is shown as read(0, ...) returning 0.

strace cat < /dev/null | ls

cat killed by SIGPIPE.

strace cat < /dev/zero | ls

CodePudding user response：

How does the pipe stop piping

The pipe stops piping when either end is closed.

If the input(write) end of the pipe is closed, then any data in the pipe is held until it is read from the output end. Once the buffer is emptied, anyone subsequently reading from the output end will get an EOF.

If the output(read) end of the pipe is closed, any data in the pipe will be discarded. Anyone subsequently writing to the input end will get a SIGPIPE/EPIPE. Note that a process merely holding open the input but not actively writing to it will not be signalled.

So when you type cat | ls you get a cat program with stdout connected to the input of the pipe and ls with stdin connected to the output. ls runs and outputs some stuff (to its stdout, which is still the terminal) and never reads from stdin. Once done it exits and closes the output of the pipe. Meanwhile cat is waiting for input from its stdin (the terminal). When it gets it (you type a line), it writes it to stdout, gets a SIGPIPE/EPIPE and exits (discarding the data as there's noone to write it to.) This closes the input of the pipe, so the pipe goes away now that both ends have been closed.

Now lets look at what happens with cat | cat | ls. You now have two pipes and two cat programs. As before ls runs and exits, closing the output of the second pipe. Now you type a line and the first cat reads it and copies it to the first pipe (still fully open) where the second cat reads it and copies it to the second pipe (which has its output closed), so it (the second cat) gets a SIGPIPE/EPIPE and exits (which closes the output of the first pipe). At this point the first cat is still waiting for input, so if you type a second line, it copies that to the now closed first pipe and gets a SIGPIPE/EPIPE and exits