I have following bash script which is supposed to download the current Wikipedia ZIM file if the file size differs:
#!/bin/bash
wikipedia_current_filesize=$(stat -c %s wikipedia.zim)
wikipedia_download_filesize=$(curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim | gawk -v IGNORECASE=1 '/^Content-Length/ { print $2 }')
echo "Wikipedia filesize [current / download]:"
echo "$wikipedia_current_filesize / $wikipedia_download_filesize"
if [ "$wikipedia_current_filesize" != "$wikipedia_download_filesize" ]
then
echo "Downloading newer version of Wikipedia..."
else
echo "No new version for Wikipedia available."
fi
The output is:
Wikipedia filesize [current / download]:
38095908569 / 38095908569
Downloading newer version of Wikipedia...
The numbers are exactly the same. Why do I still get into the if
and not into the else
branch here? Am I comparing strings the wrong way here? Is there maybe a more meaningful way, e.g. by comparing integers instead of strings?
CodePudding user response:
HTTP responses use \r\n
line endings.
gawk's default record separator is newline, which leaves the carriage return as a plain character in the last field. It can remove the trailing carriage return.
wikipedia_download_filesize=$(
curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim \
| gawk -v IGNORECASE=1 '/^Content-Length/ { print gensub(/\r$/, "", 1, $2) }'
)
Or, more awk-ishly
wikipedia_download_filesize=$(
curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim \
| gawk -v IGNORECASE=1 -v RS='\r\n' '/^Content-Length/ { print $2 }'
)