in shell scripting (bash), how does tail work for continuously updated file?-CodePudding

i have a file that is being continuously updated with. everytime i call tail file.log it will show a different result.

My question is, how do it work under the hood ? i would expect that it reads the file from the bytes backwards, but meanwhile it is being continuously udpated. how does it work without corrupting the file, and yet display a uncorrupted result ?

Everytime i google How does tail work for continuously updated file, i get solutions on how to monitor it with tail -f file.log, which i know it works.

But what i want to know is how it works.

CodePudding user response：

The man tail in every searchengine should link it to tail(1) and explain this if you read further whatfore the option -s is.

CodePudding user response：

I've never read the tail -f code but I don't think it's difficult; the only thing is to not close the file after opening it.
So, open the file for reading, read it to the end and display the last lines, wait, read up to the new end (you were already at the old end) while displaying every line, etc...

In Shell, the trick would be to use a file descriptor for not closing the file after reaching its end:

#!/bin/sh

exec 3< file.txt

while true
do
    while IFS='' read -r line
    do
        printf '%s\n' "$line"
    done <&3
    sleep 2
done

remark: this example displays the whole file content, then every 2 seconds, update the output with the new lines that were added

CodePudding user response：

i would expect that it reads the file from the bytes backwards, but meanwhile it is being continuously udpated. how does it work without corrupting the file, and yet display a uncorrupted result ?

Every open file has an associated file offset associated with an open file descriptor (see man 2 lseek, man 3 lseek and man open and file descriptor on wikipedia).

When the file offset is positioned at the end of the file (points past the last character in the file) the system call for reading more data from the file will "block" (see man 3 read) (or in case of O_NONBLOCK will return with zero bytes read, but let's say we ignore that case).

The file descriptor will be blocked for reading until new data are written to the file. The process is blocked - all logic is implemented in the kernel. Let's say the kernel has a list of "waiting" processes that wait for new events on a file. When some other process writes new data to the end of the file, kernel goes through waiting processes on the end of the file and kernel wakes those processes up.

When the process is woken up, read() call can finally read available data. read() returns the new data, moves file offset, and the process continues its work.

When reading from multiple files, there is a special system call that allows to wait for "new data" on multiple file descriptors. See man -a select and man poll.