I have a sample log file with 1000 lines, that looks like this,
TIME,STATUS
09:00,OK
09:00,TEMP
09:00,TEMP
09:00,TEMP
09:00,TEMP
09:00,TEMP
09:01,OK
09:01,OK
09:01,OK
09:01,PERM
09:01,TEMP
09:01,TEMP
09:02,OK
09:02,TEMP
09:02,TEMP
09:03,OK
09:03,PERM
09:03,PERM
09:03,TEMP
09:03,TEMP
09:04,OK
09:04,PERM
09:04,PERM
09:04,TEMP
09:04,TEMP
09:04,TEMP
09:05,OK
09:05,OK
09:05,OK
09:05,PERM
09:05,TEMP
09:05,TEMP
09:05,TEMP
09:05,TEMP
09:06,OK
09:06,OK
09:06,PERM
09:06,PERM
09:06,PERM
09:06,PERM
09:06,TEMP
09:06,TEMP
09:06,TEMP
09:06,TEMP
09:06,TEMP
09:07,OK
09:07,OK
09:07,TEMP
09:07,TEMP
09:07,TEMP
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,TEMP
09:08,TEMP
09:08,TEMP
09:08,TEMP
09:09,OK
09:09,OK
09:09,OK
09:09,PERM
09:10,OK
09:10,PERM
09:10,PERM
09:10,TEMP
09:11,OK
09:11,OK
09:11,OK
09:11,OK
09:11,PERM
09:11,PERM
09:11,PERM
09:11,PERM
09:11,TEMP
09:11,TEMP
09:11,TEMP
09:12,PERM
09:12,TEMP
09:12,TEMP
09:13,OK
09:13,OK
09:13,OK
09:13,OK
09:13,OK
09:13,PERM
09:13,PERM
09:13,PERM
09:13,TEMP
09:13,TEMP
09:14,OK
09:14,OK
09:14,OK
09:14,PERM
09:14,PERM
09:14,PERM
09:14,PERM
09:14,TEMP
09:16,OK
09:16,OK
09:16,OK
09:16,PERM
09:16,PERM
09:16,TEMP
09:16,TEMP
09:17,OK
09:17,OK
09:17,PERM
09:17,PERM
09:18,OK
09:18,OK
09:18,OK
09:18,OK
09:18,OK
09:18,PERM
09:18,PERM
09:18,TEMP
09:18,TEMP
09:18,TEMP
09:19,OK
09:19,OK
09:19,OK
09:19,OK
09:19,OK
09:19,PERM
09:20,OK
09:20,OK
09:20,PERM
09:20,PERM
09:20,TEMP
09:20,TEMP
09:21,OK
09:21,OK
09:21,OK
09:21,PERM
09:21,TEMP
09:22,OK
09:22,OK
09:22,PERM
09:22,PERM
09:22,TEMP
09:22,TEMP
09:23,OK
09:23,PERM
09:23,PERM
09:23,PERM
09:23,TEMP
09:23,TEMP
09:23,TEMP
09:24,PERM
09:24,PERM
09:24,PERM
09:25,OK
09:25,OK
09:25,PERM
09:25,TEMP
09:26,OK
09:26,OK
09:26,OK
09:26,OK
09:26,OK
09:26,PERM
09:26,TEMP
09:27,OK
09:27,OK
09:27,OK
09:27,PERM
09:27,PERM
09:27,TEMP
09:27,TEMP
09:27,TEMP
09:28,PERM
09:28,PERM
09:28,PERM
09:28,PERM
09:29,OK
...
while the final file will have 10K lines in the same time frame. I need to create a graph to show number of statuses per minute for TEMP, PERM and OK. So I would like to use a line for the status (TEMP, PERM and OK), plot time on the X axis, and frequency of occurrence on the Y axis.
I installed Gnuplot only 2 days ago on my Ubuntu 20.04.4 LTS from the standard repo:
bi@green:bin$ apt list gnuplot* 2>/dev/null | grep installed
gnuplot-data/focal,focal,now 5.2.8 dfsg1-2 all [installed,automatic]
gnuplot-qt/focal,now 5.2.8 dfsg1-2 amd64 [installed,automatic]
gnuplot/focal,focal,now 5.2.8 dfsg1-2 all [installed]
and so far I haven't managed more than this,
#!/bin/bash
x=logoutcol
cat $x
gnuplot -p <<-EOF
#set ytics scale 0
#set yzeroaxis
reset
set format x "%H:%M" time
set xdata time
set yrange [0:*]
set ylabel "Occurences"
set ytics 2
#set margin at screen 0.95
binwidth=60
bin(val) = binwidth * floor(val/binwidth)
set boxwidth binwidth
set datafile separator ","
set term png
set output "$x.png"
plot "$x" using (bin(timecolumn(1,"%H%M"))):(2) smooth freq with boxes
EOF
shotwell $x.png
rm $x.png
Any help will be much appreciated.
CodePudding user response:
I am pretty sure that there was an almost identical question here on SO, however, it seems I can't find it maybe due to my incapability of finding the right keywords for SO's search function.
The key point is the boolean expression (strcol(2) eq word(myKeys,i))
together with smooth frequency
. If the value of the second column is identical to your keyword the expression results in 1
, and 0
otherwise.
You don't need bins like in creating other histograms because you want a bin of 1 minute (and your time resolution is already 1 minute).
Check the following example as starting point for further optimization.
Script:
### count occurrences of keywords
reset session
$Data <<EOD
# TIME,STATUS
09:00,OK
09:00,TEMP
09:00,TEMP
09:00,TEMP
09:00,TEMP
09:00,TEMP
09:01,OK
09:01,OK
09:01,OK
09:01,PERM
09:01,TEMP
09:01,TEMP
09:02,OK
09:02,TEMP
09:02,TEMP
09:03,OK
09:03,PERM
09:03,PERM
09:03,TEMP
09:03,TEMP
09:04,OK
09:04,PERM
09:04,PERM
09:04,TEMP
09:04,TEMP
09:04,TEMP
09:05,OK
09:05,OK
09:05,OK
09:05,PERM
09:05,TEMP
09:05,TEMP
09:05,TEMP
09:05,TEMP
09:06,OK
09:06,OK
09:06,PERM
09:06,PERM
09:06,PERM
09:06,PERM
09:06,TEMP
09:06,TEMP
09:06,TEMP
09:06,TEMP
09:06,TEMP
09:07,OK
09:07,OK
09:07,TEMP
09:07,TEMP
09:07,TEMP
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,OK
09:08,TEMP
09:08,TEMP
09:08,TEMP
09:08,TEMP
09:09,OK
09:09,OK
09:09,OK
09:09,PERM
09:10,OK
09:10,PERM
09:10,PERM
09:10,TEMP
09:11,OK
09:11,OK
09:11,OK
09:11,OK
09:11,PERM
09:11,PERM
09:11,PERM
09:11,PERM
09:11,TEMP
09:11,TEMP
09:11,TEMP
09:12,PERM
09:12,TEMP
09:12,TEMP
09:13,OK
09:13,OK
09:13,OK
09:13,OK
09:13,OK
09:13,PERM
09:13,PERM
09:13,PERM
09:13,TEMP
09:13,TEMP
09:14,OK
09:14,OK
09:14,OK
09:14,PERM
09:14,PERM
09:14,PERM
09:14,PERM
09:14,TEMP
09:16,OK
09:16,OK
09:16,OK
09:16,PERM
09:16,PERM
09:16,TEMP
09:16,TEMP
09:17,OK
09:17,OK
09:17,PERM
09:17,PERM
09:18,OK
09:18,OK
09:18,OK
09:18,OK
09:18,OK
09:18,PERM
09:18,PERM
09:18,TEMP
09:18,TEMP
09:18,TEMP
09:19,OK
09:19,OK
09:19,OK
09:19,OK
09:19,OK
09:19,PERM
09:20,OK
09:20,OK
09:20,PERM
09:20,PERM
09:20,TEMP
09:20,TEMP
09:21,OK
09:21,OK
09:21,OK
09:21,PERM
09:21,TEMP
09:22,OK
09:22,OK
09:22,PERM
09:22,PERM
09:22,TEMP
09:22,TEMP
09:23,OK
09:23,PERM
09:23,PERM
09:23,PERM
09:23,TEMP
09:23,TEMP
09:23,TEMP
09:24,PERM
09:24,PERM
09:24,PERM
09:25,OK
09:25,OK
09:25,PERM
09:25,TEMP
09:26,OK
09:26,OK
09:26,OK
09:26,OK
09:26,OK
09:26,PERM
09:26,TEMP
09:27,OK
09:27,OK
09:27,OK
09:27,PERM
09:27,PERM
09:27,TEMP
09:27,TEMP
09:27,TEMP
09:28,PERM
09:28,PERM
09:28,PERM
09:28,PERM
09:29,OK
EOD
set datafile separator comma
myKeys = "OK TEMP PERM"
myKey(i) = word(myKeys,i)
myTimeFmt = "%H:%M"
set format x myTimeFmt timedate
plot for [i=1:words(myKeys)] $Data u (timecolumn(1,myTimeFmt)):(strcol(2) eq word(myKeys,i)) smooth freq w lp pt 7 ti word(myKeys,i)
### end of script
Result: