I have a file whose data/output looks like this:
7044 5.2 2:10 7856 4.7 0:27 10819 3.9 0:23 7176 3.3 0:25 7903 2.9 0:30 10850
I am trying to print this file, step by step after a pause of 1 second.... but bash is printing the whole file all at once.
From this answer I added a line to change IFS, this command gives:
IFS=$' ';for f in "$( cat output.txt )" ; do echo $f;sleep 1;done;
gives
Also note, that awk '{ print $1,$2,$3 }' output.txt
works as desired but the commands in the for
loop don't work iterate
step by step as desired.
Another example where the for loop does not work as expected:
awk '{ print $2 }' output.txt | tail -n2 | head -n1
<---This works
for i in "$( cat output.txt | wc -l )";do awk '{ print $2 }' output.txt | tail -n$i | head -n1; sleep 1; done
<---This does not work as expected.
CodePudding user response:
Basically, it is important to find and understand the real problem before looking for a solution. Your question boils down to you want to get a periodic, unbuffered shell data printout.
Shell buffers the data sent to STDOUT, so data loop works a bit differently than your intuition suggests, and it can be confusing. Data will be collected in the buffer until it is full or the program exits, then there will be a data flush. So if the chunks of data are larger or close to the size of the data buffer, you may get the wrong impression that you are operating "without buffering". The shell works differently interactively and when you redirect data to a file, which can be additionally confusing. Note that stderr is not cached.
For a better understanding, read this post stackoverflow.com/a/13933741/282728
We know what the problem is. How to solve it?
Solution 1. The simplest code is always the best, we only need to sequentially process the data lines and delay sending each line to STDOUT by 1s. AWK is perfect for such tasks.
awk '{print $2; system("sleep 1");}' input.txt
For ease of reference, I changed your file name from output.txt to input.txt
Solution 2. The GNU version of GAWK also allows you to use fflush () to flush the buffer. If you have gawk version 5.1 or less you can also use the "time" extension followed by the gawk sleep () function instead of creating a sub-shell and system sleep.
gawk '@load "time"; { print $2; fflush(); sleep(1); }' input.txt
To consider:
a. If the problem were more complex and you hadn't used dd, cat and tee in your pipeline, perhaps you should be interested in stdbuf in the GNU coreutils package https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html
stdbuf -oL [nohup] yourscript
-o switches to stdout
-L turn on line buffering
the optional nohup prevents the script from terminating after e.g. loss of remote connection, which can be useful if a task takes a long time.
b. If the data were to be periodicaly transferred to the result file, then the use of the script program could be considered:
[nohup] script -q -c yourprogram -f output.txt
-q mute script, block messages like "done" from being sent to stdout
-c starts the program instead of the interactive shell
c. or write a small C program to flushed the buffer. this is just a simple buffer underrun demonstration, not a complete solution!
int x=0;
while(x<10) {
printf("%d",x);
fflush(stdout);
sleep(1);
x ;
}
See flush in stdlib (stdio.h) https://en.cppreference.com/w/c/io/fflush sleep belongs to the POSIX standard, not C99, hence the need to use the unistd.h library https://pubs.opengroup.org/onlinepubs/007908799/xsh/sleep.html
d. Other programming languages naturally have similar buffer flushing.