I try to run script to find the missing dates in a file "date_meta", therefore I try to write a list of whole dates using shell script and put it in file "date_correct". The format is %d%H%M with increment of 30 minutes. I get this error: line 9: [[: 2022-01-01T00: value too great for base (error token is "01T00")
The script:
#!/bin/sh
strdate='2022-01-01T00:00'
enddate='2022-01-31T23:30'
while [[ ${strdate} -le ${enddate} ]] ; do
echo $strdate>>date_correct
strdate=$(date -d "$strdate 30 minute" %d%H%M)
done
diff date_metar date_correct >output
CodePudding user response:
Your best bet for generating the range of valid dates/times will probably come from combining two ideas:
- use epoch seconds for comparisons and math
- use
awk
(or comparable program) to replace the time-consumingbash/while
loop
One epoch(secs) / awk
idea:
strdate='2022-01-01T00:00'
enddate='2022-01-31T23:30'
strdate_s=$(date -d "${strdate}" %s)
enddate_s=$(date -d "${enddate}" %s)
inc_m=30
((inc_s = inc_m * 60))
awk -v ss="${strdate_s}" -v es="${enddate_s}" -v inc="${inc_s}" '
BEGIN { while ( ss <= es ) {
print strftime("%d%H%M", ss)
ss =inc
}
}
' > date_correct
NOTE: as Fravadona's mentioned in the comments, strftime()
requires GNU awk
(aka gawk
)
To show the performance improvement of using awk
instead of the bash/while
loop we'll modify OP's current code to use the epoch(secs)
approach:
strdate='2022-01-01T00:00'
enddate='2022-01-31T23:30'
strdate_s=$(date -d "${strdate}" %s)
enddate_s=$(date -d "${enddate}" %s)
inc_m=30
((inc_s = inc_m * 60))
while [[ "${strdate_s}" -le "${enddate_s}" ]] ; do
date -d "@${strdate_s}" %d%H%M >> date_correct2
((strdate_s =inc_s))
done
A diff
of the outputs show both sets of code generate the same output:
$ diff date_correct date_correct2
<<<=== no output
Results of running both processes under time
:
# awk
real 0m0.042s
user 0m0.015s
sys 0m0.015s
# bash/while
real 0m46.412s
user 0m6.727s
sys 0m27.314s
So awk
is about 1100x times faster than a comparable bash/while
loop.
If the sole purpose of this date/time-generating code is simply to find the missing dates/times in the date_metar
file then OP may want to consider using a single awk
script to eliminate the need for the date_correct
file and still determine what dates/times are missing from date_metar
... but that's for another Q&A ...
Looking a bit more into the performance issues of the bash/while
loop ...
Replacing the date
call with a comparable printf -v
call:
while [[ "${strdate_s}" -le "${enddate_s}" ]] ; do
printf -v new_date '%(%d%H%M)T' "${strdate_s}"
echo "${new_date}" >> date_correct2
((strdate_s =inc_s))
done
We see overall time is reduced from 46 secs to 10 secs:
real 0m10.127s
user 0m0.141s
sys 0m0.312s
We should be able to get a further improvement by moving the >> date_correct2
to after the done
, thus replacing 1400 file open/close operations (date ... >> date_correct2
) with a single file open/close operation (done > date_correct2
)
while [[ "${strdate_s}" -le "${enddate_s}" ]] ; do
printf -v new_date '%(%d%H%M)T' "${strdate_s}"
echo "${new_date}"
((strdate_s =inc_s))
done > date_correct2
This speeds up the process by ~50x times (10 secs down to 0.2 secs):
real 0m0.198s
user 0m0.141s
sys 0m0.000s
Thus reducing the bash/while
loop overhead (compared to awk
) from 1100x to 5x.
CodePudding user response:
strftime()
does not require gnu-awk ::: gawk
mawk1 'BEGIN { fmt = "%Y-%m-%d %H:%M:%S %Z ( epochs %s | %Y-%j )" print ORS, systime(), ORS ORS, strftime(fmt, systime()), ORS }'
1662840559
2022-09-10 16:09:19 EDT ( epochs 1662840559 | 2022-253 )
you'll get a very tiny, almost statistically insignificant, speed gain via mawk-1
:
{m,g}awk -v __='2022 01 01 00 00 00' \ -v ___='2022 01 31 23 30 00' ' BEGIN { _*= (_ =(_ =(_^=_<_) _)^_) _ __ = mktime(__) ___ = mktime(___) ____ = "%Y-%m-%d %H:%M:%S %Z ( %s | %Y-%j )" do { print __, strftime(____,__) } while ((__ =_)<=___) }'
out9: 78.5KiB 0:00:00 [64.4MiB/s] [64.4MiB/s] [<=> ]
( mawk -v __='2022 01 01 00 00 00' -v ___='2022 01 31 23 30 00' -- ; )
0.01s user 0.00s system 89% cpu 0.019 total
1484 1643682600 2022-01-31 21:30:00 EST ( 1643682600 | 2022-031 )
1485 1643684400 2022-01-31 22:00:00 EST ( 1643684400 | 2022-031 )
1486 1643686200 2022-01-31 22:30:00 EST ( 1643686200 | 2022-031 )
1487 1643688000 2022-01-31 23:00:00 EST ( 1643688000 | 2022-031 )
1488 1643689800 2022-01-31 23:30:00 EST ( 1643689800 | 2022-031 )
|
out9: 78.5KiB 0:00:00 [17.0MiB/s] [17.0MiB/s] [<=> ]
( gawk -v __='2022 01 01 00 00 00' -v ___='2022 01 31 23 30 00' -be ; )
0.02s user 0.01s system 85% cpu 0.026 total
1484 1643682600 2022-01-31 21:30:00 EST ( 1643682600 | 2022-031 )
1485 1643684400 2022-01-31 22:00:00 EST ( 1643684400 | 2022-031 )
1486 1643686200 2022-01-31 22:30:00 EST ( 1643686200 | 2022-031 )
1487 1643688000 2022-01-31 23:00:00 EST ( 1643688000 | 2022-031 )
1488 1643689800 2022-01-31 23:30:00 EST ( 1643689800 | 2022-031 )