Home > Net >  Bash string comparison gives wrong value
Bash string comparison gives wrong value

Time:06-24

I have following bash script which is supposed to download the current Wikipedia ZIM file if the file size differs:

#!/bin/bash

wikipedia_current_filesize=$(stat -c %s wikipedia.zim)
wikipedia_download_filesize=$(curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim | gawk -v IGNORECASE=1 '/^Content-Length/ { print $2 }')

echo "Wikipedia filesize [current / download]:"
echo "$wikipedia_current_filesize / $wikipedia_download_filesize"

if [ "$wikipedia_current_filesize" != "$wikipedia_download_filesize" ]
then
  echo "Downloading newer version of Wikipedia..."
else
  echo "No new version for Wikipedia available."
fi

The output is:

Wikipedia filesize [current / download]:
38095908569 / 38095908569
Downloading newer version of Wikipedia...

The numbers are exactly the same. Why do I still get into the if and not into the else branch here? Am I comparing strings the wrong way here? Is there maybe a more meaningful way, e.g. by comparing integers instead of strings?

CodePudding user response:

HTTP responses use \r\n line endings.

gawk's default record separator is newline, which leaves the carriage return as a plain character in the last field. It can remove the trailing carriage return.

wikipedia_download_filesize=$(
    curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim \
    | gawk -v IGNORECASE=1 '/^Content-Length/ { print gensub(/\r$/, "", 1, $2) }'
)

Or, more awk-ishly

wikipedia_download_filesize=$(
    curl -s -L -I https://download.kiwix.org/zim/wikipedia_de_all_maxi.zim \
    | gawk -v IGNORECASE=1 -v RS='\r\n' '/^Content-Length/ { print $2 }'
)
  •  Tags:  
  • bash
  • Related